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PREFACE 


The science of statistics in this country seems to have been subject 
to two main influences. The biometric school associated with Pro- 
fessor Karl Pearson have developed methods and concepts wjiich 
form the basis of the whole subject, and in more recent years the needs 
of biological experimentalists have been met by developments in 
the theory, due largely to Dr. R. A. Fisher. There are many text- 
books on what may be regarded as the classical theory of statistics, 
and Fisher’s own methods are described in his book, Statistical 
Methods for Research Workers. In the present book I have attempted 
to present a single system of statistics, so that a reader with little' 
previous acquaintance may obtain a good working knowledge and 
understanding of the methods available. 

The first chapters deal with frequency distributions and constants, 
and with the theory of errors, in orthodox manner, but in the later 
chapters the underlying theme is Fisher’s idea of the Analysis of 
Variance; correlation is introduced as a special case of this. There 
are, of course, other ways of regarding the subject, but its unity 
seems to me to be brought out more by this than by any other 
method of presentation. 

In outlining the theory underlying the methods described, I have 
given mathematical proofs where they are easy; but where they are 
not, I have not hesitated merely to state the results. The necessary 
mathematical attainments of readers are slight and include algebra 
up to the binomial theorem and the elementary notions of the 
calculus; even those who have to omit the mathematics entirely 
should be able to arrive at an appreciation of the arguments. Never- 
theless, the theory of statistics is not easy, not so much because it is 
abstruse, as because the ideas are new to most people, and a good 
deal of hard thinking and patient work will be necessary. 

I have drawn largely on the literature for examples, and it will be 
obvious to biologists that I am not one, for some of the statistical 
deductions I make are of no biological interest, although the methods 
applied to other inquiries may be of great value. Statistics is a 
tool, and the good craftsman knows what he wants to make with 
Ms tool before he lifts it; it is only the small boy who uses it promis- 
cuously. In tMs book I have often been like the small boy, but if 
readers will be small boys with me, they will learn how to handle 
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the tool, and then will be able to apply it with biological judgment. 
There is no better way of learning statistics than by working through 
examples. 

I wish to thank Mr. F. D. Farrow, Mr. H. A. Hancock and 
Miss R. Hodgson for their help and criticisms, and Mr. L. D. 
Galloway and Mr. J. Gregory for providing me with unpublished 
material for examples. 

L. H. C. TIPPETT 


August^ 1931 


PREFACE TO SECOND EDITION 

For this edition I have revised the book considerably, and in par- 
ticular have re-written the first three chapters. There is now more 
mathematical theory than before, and although I have not tried to 
prove everything, I have aimed at giving typical proofs to show in 
what way the various distributions and methods arise. These proofs 
are in separate sub-sections, and may be omitted if desired. There 
are other additions that call for no special comment. 

Several topics that were dealt with in the first edition are now 
excluded. Tests of significance based on the correlation ratio are 
omitted because they are less convenient than the equivalent ;s:-tests, 
and the intra-class correlation coefficient because it is now only 
of historical interest. I have not included the theory of ranking 
because (i) it is not often needed, (2) when required it is usually 
for application to small samples whereas the theory given before 
was for large samples, and (3) I am not aware of any convenient 
methods involving ranking that are as well founded as the other 
methods given in this book. 

I wish to acknowledge with thanks the criticisms, suggestions and 
corrections received from many friends and correspondents, and am 
particularly indebted to Dr. A. J. Turner for reading and comment- 
ing on the manuscript of this revised edition. 


June^ 1937 


L. H. C. T. 



PREFACE TO THIRD EDITION 

Every year I become aware of new directions in which. I think this 
book could be improved, and developments in the subject of statistics 
also make periodical revisions desirable. The changes I would wish 
to make at this stage, however, are not sufficiently important to 
justify any considerable amount of resetting of the type, particularly 
as ‘‘there’s a war on,” and so this edition is the same as the second 
except for a few minor corrections — ^and one important addition. 

The addition is a set of tables, given at the end of the book, of 
the normal probability integral, t and tables based on Fisher’s z 
for testing the significance of differences between variances. These 
tables are a shortened and, in some instances, slightly modified form 
of those in R. A. Fisher’s Statistical Methods for Research Workers, 
and are sufficient for many of the experimenter’s everyday require- 
ments. I acknowledge with pleasure the generosity of Professor 
Fisher and his publishers, Messrs. Oliver & Boyd, in allowing me 
to reproduce these tables. 

L. H. C. T. 


March, 1941 
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THE METHODS OF STATISTICS 


INTRODUCTION 

Experimental and Statistical Methods Compared 

Methods of scientific inquiry may be classified in several ways, 
one of which has two classes commonly called the experimental and 
the statistical methods. The experimental method consists in isolating 
the factors that are under investigation, perhaps varying one or two 
of them in a determinate manner and keeping all the irrelevant 
factors relatively constant; observations are then made on the 
resulting system. This method serves for most physical and chemical 
investigations. The so-called statistical method is applied where the 
variations in the factors cannot be controlled and the observer has 
to take observations of things as they are and make what inferences 
he can from the data. This, in a considerable degree, is the con- 
dition under which most biologists work. The results of an investi- 
gation in which the experimental method has been successfully 
applied are comparatively simple, and in dealing with them there is 
scarcely any need for an elaborate calculus; but the data from a 
statistical inquiry are complex, and special methods of reduction 
and interpretation are necessary. These are the methods of statistics. 

In actual fact, however, no investigation is purely experimental 
or statistical, using the words in the above restricted sense, for 
whereas perfect control is unattainable, all variations can be con- 
trolled in some degree, either by applying experimental methods 
or by selection and arrangement. Partial experimental control occurs 
in a manurial field trial, where the agriculturalist varies the quantity 
and kind of fertiliser, and although he endeavours to keep* other 
factors such as irrigation and cultivation constant, there are some he 
cannot control. There is control by selection and arrangement in an 
investigation where children are grouped according to age and the 
mean height of each group is calculated to determine the relation 
between height and age. To make this book of practical use we follow 
the common practice of extending the scope of the words statistics and 
statistical to cover all investigations in which there is any importtot 
degree of uncontrollable variation, whether or not some control is 
possible. This means that the methods described here have a very 
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wide field of application. Some of the most important advances in 
statistical theory during the past decade or so have been in the 
direction of developing methods that are applicable to the composite 
kind of investigation in which there is partial control. 

Terminology 

Modern statistical methods were first developed largely for the 
study of problems of heredity by making measurements on the 
human population. The data consisted of measurements of some 
character or attrihute such as height made on a large number of 
people termed individuals. Usually several hundred individuals were 
measured and these were regarded as comprising a sample repre- 
sentative of a much larger collection which was called the population 
or the universe. These terms have an obvious significance when 
applied to human beings; in statistics they are generalised and are- 
used to refer to the corresponding features of collections of data 
about any subject. Several characters may be measured on the same 
individual, and the individual may be a single being or a group; 
for instance, if the character is the size of families, the individuals 
are families. The description of the character or attribute is called 
the variable or variate,* which may be quantitative or qualitative, 
continuous or discrete. A variate is quantitative when it can be 
expressed in numbers (e.g. height in inches) and qualitative when it 
can only be described in broad classes (e.g. hair colour); it is con- 
tinuous when the steps by which it may vary from one value to 
another are indefinitely small (e.g. height) and discrete when the 
steps are finite (e.g. number of petals on a flower). 

Populations 

One of the aims of statistical methods is to describe the charac- 
teristidfe of populations, and readers must arrive at the conception 
of a population as a statistical entity that is made up of individuals 
and yet is something more than their sum. When the individuals 
do not vary in character, the population is merely their sum, but 
when they vary, the population may be something more and may 
have properties the individuals do not possess. For example, the 

* There appears to be no precise distinction between the two terms, 
unless it is that the general character is the variable (e.g. height), and when 
specified by a measurement it is the variate (e.g. height of Englishmen in 
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average number per family in a large group of families may be 
3*67 (say), but no individual family can include a fraction of a 
person; or again, the strength of a chain is certainly not the sum 
nor the average of the strengths of the links. The conception is 
essential to some physical theories in which matter is regarded as an 
aggregate of elemental units and has properties different from, but 
theoretically deducible from those of the units; in statistical language, 
matter is a population and the elemental units (atoms, molecules, 
electrons) are individuals. 

An important property of a population is that its characteristics 
are relatively stable and predictable, whereas the individuals are 
variable, and within the range of variation are unpredictable. The 
classic example of this is life insurance; life is uncertain and it is 
impossible to state when any one person will die, but an insurance 
company dealing with large numbers of people can adopt a sound 
financial policy because the incidence of death in a large group — ^the 
statistical population — is comparatively certain. 

The experimentalist usually encounters uncontrollable variation in 
the form of experimental error. When measuring a quantity, he may 
make several determinations and attempt to minimise the effect of 
the errors by taking an average, which is regarded as the true value 
of the quantity; the variations about that value, being errors, are 
often regarded as having no significance. This is not the attitude of 
the statistician to a collection of data. There are no ^ftrue value” and 
‘‘errors”; all the readings are equally real and significant, and the 
variations are as important characteristics of the population as the 
average. 

Theory of Errors 

A second aim of statistical methods is to describe the variations 
between samples, and hence the differences between samples and the 
corresponding populations, so that inferences regarding the popu- 
lation may be made from observations on the sample. This branch 
of the subject is called the Theory of Errors, The term errors is 
probably a legacy of the early association of the subject with the 
needs of experimentalists, and is unfortunate; in our view these 
variations are not to be considered as errors in the usual sense. We 
shall follow custom, and use the term in the following pages, but it 
would be well if readers would usually substitute some such para- 
phrase as deviations or variations. 
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The theory of errors ultimately reduces all sampling experience 
to a common measure, a probability, and up to that point it is 
exact. It is true that simplifying approximations are made in dealing 
with large samples, and that inaccuracies may arise because the 
assumptions are not always quite satisfied in practical experience; 
but precise theory has been developed for small samples, and the 
latter inaccuracies are not usually serious. In any case, these 
inaccuracies are not inherent in the theory, and it is important that 
the reader should not start off with the impression that statistics 
necessarily lacks precision in its ideas. There must, however, be 
uncertainties in dealing with variable quantities, but these lie in the 
interpretation of the probabilities obtained. When they relate to a 
single experience, chances of ao to i are not very different from 
chances of 25 to i (although when applied to a large number of 
experiences the difference is important), and it is in this that the lack 
of precision lies. Fisher (1935) draws a distinction between inferences 
that are uncertain and those that are not mathematically rigorous. 
Statistical inferences are uncertain, but they may be rigorous. 

The view that statistical methods are only suitable for large 
samples is mistaken. All methods of presenting the information 
contained in a sample are statistical, and when the sample is small 
it is all the more necessary to use the soundest treatment, so that 
false conclusions may be avoided. 

Readers will need books of tables which are referred to in the 
text, and the following seems to be the minimum equipment. 

Barlow* s Tables of Squares ^ etc. (3rd edition, 1930). 

Tables for Statisticians and Biometricians ^ Part I (3rd edition, 
1931), edited by K. Pearson. 

Statistical Methods for Research Workers (7th edition, 1938), by 
K. A. Fisher, contains several essential tables, in addition to a 
more advanced treatment of many of the methods described here. 

Statistical Tables for Biological^ Agricultural and Medical Research 
Workers (ist edition, 1938), by R. A. Fisher and F. Yates, 
contains all the tables of the above book and also a number 
of other useful ones. 

A table of natural logarithms will also be found useful. 

Statistical work always involves a lot of computing, and a calcu- 
lating machine is very useful. These may be very expensive, ’ but 
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FREQUENCY DISTRIBUTIONS AND CONSTANTS 

In this chapter we shall only be concerned with part of the first 
ainoL of statistics : the description of populations or large collections 
of data consisting of readings of a single character made on indi- 
viduals taken at random. In its undigested form, such a collection 
of data is difficult to present and interpret; it is impossible to see 
the wood for the trees. As in the formation of any scientific theory 
or description of any complex phenomena, it is necessary to sum- 
marise, or isolate a few important features for description, discarding 
those that are irrelevant. There are two main stages in the process 
"^f statistical summarisation: (i) the orderly presentation of the data 
in the form of tables and diagrams, and (2) the calculation of 
statistical constants that describe single features of the data; these 
are dealt with in turn. 

In this chapter, there will be no need to distinguish between 
populations and samples, and we shall regard them as equivalent. 
We may note, however, the common experience that samples may 
differ somewhat from the populations and may show idiosyncrasies 
that are of no practical significance. 

Frequency Distributions 

1.1. As an example of a collection of random observations we may 
consider the data in Table 1.3, ignoring the rows labelled Means 
and Ranges. The readings are of some unspecified character. Here, 
the order in which the data are recorded is of no importance, and 
we need only pay attention to their magnitude,^ Further, ^the 
readings in Table 1.3 vary between 27 and 72 units, and compared 
with this range it is not important to know any individual value 
correct to one unit; indeed for most practical purposes it is enough 
to know the value correct to three units, say. Hence, the whole 
range may be divided into a number of sub-ranges each containing 
three units; 27-29, 30-32, 33-35, . . . 7^-74, and the number or 
frequency of individuals in each sub-range may be recorded, thus 
giving a frequency table. There are several examples of fre- 

^ The order is not always irrelevant. It is all-important in a time-series 
(Chapter IX). 
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quency tables in Table la. The first of these shows that only 
5 one recruit was between 51 and 52 inphes in height, one was 
between 52 and 53 inches, two were between 53 and 54 inches, 
and so on* 

Graphical representation of a distribution as a frequency diagram 
is useful for aiding the appreciation of its features, and this may be 
done in three ways* The first method is to mark off distances along 
the x-scns proportional to the variate, to raise perpendiculars or 
ordinates proportional to the frequency, and to join their ends by 
straight lines, thus forming z. frequency polygon. The second method 
is to measure along the ,;^-axis distances proportional to the character, 
and to raise rectangles proportional in area to the number of indi- 
viduals falling within each group, the width occupying the extent 
of the sub-range on the horizontal scale ; the resulting figure is called 
, a histogram. These two methods are practically equivalent when the'" 
sub-ranges are equal, but when this is not so, the histogram must 
be used. In example (iii) of Table 1*1, which gives the distribution 
of distances between reversals in the spiral form of the cotton hair,* 
the sub-ranges are smaller for the shorter distances than for the 
longer ones ; when the sub-range is 2 units, the height of the rect- 
angle is O' 5 for every unit of frequency; when the sub-range is 10, 
the corresponding height is o*i. 

There is an important distinction between the above two types 
of diagram; whereas in the frequency polygon the height measures 
the frequency in the sub-range around the ordinate, in the histogram 
thj^ area under the diagram between two ordinates measures the 
frequency of individuals in the range between the values at which 
the ordinates cut the ^-axis. The second conception is more useful 
in the 4eyejppmei).t of frequency curves, and the former should be 
regarded as an approximation to it. The frequency polygon is more 
apprpptiate,^ however, when two or more curves are superimposed 
for comparison, because the eye can follow the sloping lines more 
easily than it can the steps of the histogram. For such comparisons 
the frequencies should usually be reduced to the same terms by 
being given as percentages of the total. 

A third method of graphical representation which is rather less 
used is the cumulative diagram^ that in some contexts is known as 
the ogive and in others as the survivor curve. For this, along one 
axis is measured the character, and along the other is measured the 
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total number of observations having a value greater or less than the 
corresponding character value. 

The examples of Table i.i are given graphically in one form or 
another in Fig. i. 

Frequency tables and diagrams (we may use ‘‘distribution’’ as the 
generic term) are a very economical way of representing data, and 
should always be given in some form, whatever constants are deduced 
from them, for they present practically all the information in the 
original data and do not occupy much room. The distribution of 
heights of recruits in the U.S. army, in which the diagram rises to 
a single hump, and falls off more or less symmetrically on each side, 
is a common form, particularly for anthropological measurements. 
The hump shows that the heights tend to be concentrated round a 
modal or fashionable value, and that the numbers become gradually 
less frequent towards the extremes. It may assist some readers to 
imagine a marksman firing at a target consisting of a bull’s-eye with 
a series of concentric rings. If the marksman fires many shots, they 
will be peppered all over the target, but most of them will be in the 
“bull,” rather fewer will be in the first circle, and very few will be 
in the outer circles, and in fact, if the target is divided into two 
halves by a vertical line through the centre of the “bull,” a frequency 
distribution of the number of shots falling in each semicircular 
zone can be made. If the marksman has no bias, the distribution will 
resemble, superficially at least, the distribution of Table i,i (i). 
Further, if there are two marksmen, one good and one poor, the 
poor one will have fewer shots near the centre and more in the 
outer rings, and if both frequency diagrams are drawn on the same 
scale, that of the good marksman will be taller and thinner than that 
of the poor one. This difference in the degree of dispersion is an 
important property of frequency distributions. We have given the 
above as an illustration, but it must not be pressed as an analogy, 
for it rather suggests the deviations f^m the “bull” or centre as 
errors, and all statistical variations should not be so regarded. 

Frequency distributions are not always symmetrical as in the 
first example of Table i.i, and the form of the second one is quite 
common. Such a distribution is said to be skew, and is the sort 
that would result if the marksman of the above illustration showed 
a bias to one side or the other. The distances between reversals in 
cotton (iii) and the distribution of species among yice-counties (iv) 
illustrate more extreme forms of skewness, in which the first group 
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Fig. I. — ^FREQUENCY DIAGRAMS. 
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Qv) NUMBER OF VICE^OUNTIES 



Fig. I. — ^FREQUENCY DIAGRAMS — continued. 
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contains many individuals ; indeed for the latter it is the most fre- 
quent, showing that most of the 266 species are found only in fewer 
than eight vice-counties. This is the form of the ‘‘hollow curve’’ 
mentioned in Willis’s book, Age and Area, and similar distributions 
are given by the number of petals of some flowers. The distribution 
of degrees of cloudiness is of a very unusual form; it shows that 
for July the sky is usually either nearly completely overcast or 
completely clear, and that comparatively seldom is there a moderate 
amount of cloudiness. Sometimes there are two or more humps or 
modes as they are called, as in the rays of chrysanthemums, and such 
a distribution usually indicates some mixture of types in the data. 
Care must be taken, of course, not to mistake some small irregu- 
larity, such as often occurs, for a second mode. 

Formation of Frequency Tables 

1 . 11 . The determination of the appropriate size of sub-group of a 
frequency table or the number of classes into which the total range 
is divided is a compromise between giving too much and too little 
detail. If the variate is continuous and the sample is large, it has 
been found convenient to have between ten and twenty classes, but 
if the number of readings is less than 100 fewer groups show up 
better the essential features of the distribution. Care should be 
taken not to make the grouping too fine relative to the unit of 
measurement of the character. In dealing with the age returns of 
men, for instance, it is usually inadvisable to choose sub-ranges 
of less than a year, for many people give their age last birthday, so 
that the groups representing integral years contain an undue number 
of observations, while those representing fractions have compara- 
tively few. If the variate is discrete, the natural unit of grouping 
is the unit of variation, as in Table i.i, (ii) and (vi), unless there 
are too many for the size of the sample and the distribution is 
irregular; in such instances, several units may be combined to form 
one group as in the fourth example of Table i.i, where the 71 vice- 
counties have been divided into 10 groups: 

If the variate is continuous, the sub-ranges must also be con- 
tinuous; there must be no gap between the highest value in one 
and the lowest in the next. If this is borne in mind when making 
the measurements, there need be little difficulty in forming the 
distribution, but sometimes the data are collected without reference 
to the fact that they may have to be grouped, and then care is 
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necessary, A common way of recording data is correct to the nearest 
unit. Suppose, for example, the data in Table 1.3 are of a con- 
tinuous variate and are correct to the nearest unit, then the system 
of classification into groups of three' units is unsatisfactory in that 
there is a gap between the sub-ranges; the first nominally stops 
at 29 and the second starts at 30 units — where does an individual 
with an actual value of 29*7 units (say) go? The readings recorded 
as 29 units may be anywhere between 28*5 and 29*5 units, so the 
sub-range which nominally extends from 27 to 29 units actually 
extends from 26 *5 to 29-5; the nominal sub-range 30""32 actually 
extends from 29*5 to 32*5 units, and so on. The actual sub-ranges 
join up. 

Qualitative data may, of course, be classified and formed into a 
frequency distribution in which the sub-groups are the qualities de- 
scribed ; in dealing with hair colour, for instance, red, fair, light brown, 
dark brown, and jet black may be taken as the separate classes. 

Frequency Constants 

1,2. Although frequency diagrams are useful as giving a visual 
impression of the characteristics of a sample, the evidence of the 
eye alone is not enough, and some numerical measure of the important 
characteristics should be used for more exact description and com- 
parison. The determination of such a measure carries the process 
of condensing the original data a stage further than the formation 
of a frequency distribution, and this can only be done by ignoring 
further features as being irrelevant to the particular investigation. 
Sometimes a special constant may be designed for some particular 
purpose; for example in the kinetic theory of gases, the mean square 
velocity is a sufficient description of the distribution of velocities 
of the molecules for the purpose of establishing energy relatioiiships. 
If the frequency diagram is of unusual shape, e.g. multi-modal, 
special methods of expression may be necessary, but for most 
purposes and for. distributions commonly met with, there are 
standard frequency constants that have been devised to express 
various characteristics. These constants will be described here. It 
may be emphasised at the outset, however, that in using constants 
the investigator should always keep the practical problem in view 
and choose methods that are appropriate. There is no virtue in 
calculating statistical constants in a blind, routine manner, without 
knowing first that they will be of some use. 
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Proportionate Frequencies 

1 . 21 . For many purposes, the investigator is only interested in the 
proportion of individuals having characters below or above a certain 
value, or between two. It can well be imagined that the U.S. 
recruiting official would be interested in knowing the proportion 
of men below (say) 65 inches in height, so that he would know to 
what extent the number of recruits would fall off if he were to make 
that the minimum height allowable, and from Table i.i (i) we see 
that the ratio is 6 825 : 25 878, or about 26*4 per cent. 

It usually (but not always) happens that the size of the sample 
is governed by some arbitrary conditions that have no objective 
significance, so that the distribution is expressed in a more general 
form if the frequencies are given as fractions or percentages of the 
total. 

When a distribution is represented diagrammatically by a histo- 
gram, the proportionate frequency between two limits is the area 
under the diagram between two ordinates drawn* at those limits. 

Frequencies and proportionate frequencies underlie nearly all 
methods of statistical representation. Whatever constants may be 
calculated or however elaborate may be the analysis, the final inter- 
pretation is in terms of frequencies. If the data are sufficiently 
numerous and the problem is simple enough for an answer in terms 
of frequencies to be obtained directly, there is no need to compute 
further constants. 

Position 

1 . 22 . The location or position of a frequency distribution is some 
single measure describing a value of the variate about which the 
observations are scattered. For example, it may be the bull’s-eye 
aimed at by our marksman. There are several constants for describing 
this. 

The most common constant is the mean or average, which is the 
sum of all the observations divided by the number.* It is well 
known and understood, but while admitting the great usefulness of 
the mean-^it is perhaps the most usefdl constant of all — ^we would 
remind readers that it only measures one characteristic of a distribu- 
tion, and that there are others of great importance. 

* This is the arithmetic mean; the geometric and harmonic means are 
not so much used in statistics. 
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An alternative measure of position is the median. If all the 
observations in a sample are ranked in order of value, the median 
is the value of the middle one if the total number is odd, or a value 
between the two middle ones if the total number is even. The 
number of individuals greater than the median value is equal to 
the number that are smaller, and an ordinate drawn at that value 
divides a histogram into two equal areas. The median is troublesome 
to find if the sample is large and the grouping broad, for all the 
individuals in the group containing the median have to be ranked in 
order. In the third example of Table i.i there are i 1 17 observations, 
so the median is the value of the 559th placed in order, and lies 
somewhere between 16*5 and 20*5 scale divisions. In that group 
of 94 individuals the median is the 6th in order of magnitude, and 
we may take it as being approximately i6'5 + X 4 == 16*76 
divisions. 

The mode is that value of the variate about which the observations 
are most concentrated, that is the value at which the ordinate of the 
frequency diagram is highest, and in Table i.i (i), for instance, it is 
between 66 and 67 inches. It is not always easy to define accurately 
in a sample, for it may be anywhere in the most frequent group, 
or if the distribution is very fiat at the top, and irregular, it is not 
even easy to decide which would probably be the modal group in 
the whole population ; moreover, a distribution may have more than 
one real mode, as has already been illustrated. Usually, however, if 
there is a single mode, the position can be found by assuming 
Pearson’s system of frequency curves as shown in section 1.24. 

When the distribution is symmetrical, the mean, median and mode 
always coincide, and in all other single-modal cases the median comes 
between the mean and the mode, the dispersion between these three 
constants being a measure of the extent of as)nnmetry of the distri- 
bution. For practical purposes, the following formula holds approxi- 
mately if the asymmetry is only moderate : 

Mean — Mode = 3 {Mean — Median). 

Of these constants, the mean is the most fundamental from a 
theoretical point of view and is the only one that can be used in 
further analysis of the data. For the mere representation of the 
central tendency of a distribution, the median is sometimes recom- 
mended because it is said to be least affected by extreme individuals, 
but for most symmetrical uni-modal distributions, the mean is the 
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most stable constant and is least affected by idiosyncrasies of the 
particular sample. This result is derived from the theory of errors. 
When the variate is the life of the individuals under some test of 
endurance, e.g. the life of electric lamps when burnt at a standard 
voltage, the median may be an economical constant to determine ; 
for when half the individuals have failed, the median value ma y be 
determined without any further testing. Since the modal is the most 
typical value, it may be the most suitable single constant to use 
when the distribution is very skew. 


Variahility or Dispersion 

1 . 23 . There are several constants devised to measure the degree of 
variation or dispersion of the variate, which, in our analogy of 
section i.i, is the quality that distinguishes the marksmen. The 
most fundamental of these are the second moment, or variance, and 
its square root, called the standard deviation. The second moment is 
the mean squared deviation from the mean, and may be written 

S(x — itf 

where /xj is the second moment, x the successive values of the variate 
in the sample, x the mean, N the number of individuals, and S the 
summation for all values of x. The standard deviation is 



, and will usually be denoted by the symbol 5 or o. When the standard 
deviation is expressed as a percentage of the mean, it is called the 
coefficient of variation. The computation of some of these constants 
will be illustrated at the end of the chapter. 

Anothermeasure, the mean demotion, is often used. It is the sum of 
the deviations from the mean, irrespective of sign, divided by the 
number of observations. For frequency distributions of the “normal” 
form (which will be described in section 2.5), the standard deviation 
is 1 • 253 times the mean deviation. This is the basis of Peter’s method 
of estimating the standard deviation. 


The quartile deviation is the deviation measured on either side of 
the mean, so chosen that as many observations lie beyond the quartile 
value as lie between it and the median, and ordinates drawn at the 
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median and at the two quartiles divide the area under a histogram 
into four equal parts. If the distribution is asymmetrical, the upper 
and lower quartile deviations are not equal. The average of the two 
is half the difference between the two quartile values and is called 
the semi-interquartile distance. In Table i.i (iii), if all the observa- 
tions were placed in order, we should have: 


379 obs. 


first 

quartile 


279 obs. 


median 


279 obs. 


third 

quartile 


279 obs. 


The first quartile lies between 8*5 and 10-5 scale divisions, and 
the third between 28*5 and 32*5 divisions, and the quartile devia- 
tions are the distances of these values from the mean. Historically, 


TABLE 1.2 


Number In Sample 

Mean Range 

Standard Deviation 


1*128 

5 

2 -336 

10 

3-078 

50 

4-498 

100 

5-015 

500 

6-073 


this was one of the earliest measures of dispersion used, but there 
is no reason why the deviation of the ordinates dividing the frequency 
diagram in any other proportions should not be used and, indeed, 
K. Pearson has shown (1920) that the deviation of the ordinate which 
cuts off ']^th of the area determines the dispersion most accurately 
for the ‘‘normaP’ distribution. 

At first sight the range^ which is the difference between the greatest 
and smallest members of a sample, would appear to be the most 
natural index of dispersion ; but it unfortunately varies for samples 
of different sizes taken from the same population, being smaller 
for the smaller samples. When the population form and sample 
size are constant, however, the mean range is proportional to the 
standard deviation, and if their ratio is known the latter can be 
found from the former. For the “normaF’ form of distribution, 
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full tables are given in Biometrika, XVII (1925), p. 386, of the 
ratio 

Mean Range 
Standard Deviation 

for samples of between 2 and i 000 individuals, and an abstract of 
them is given in Table 1.2 ; from this we may see how much the range 
depends on the size of the sample. Of course, single values of the 
range are liable to rather large chance errors, but if it is convenient 


TABLE 1.3 



54 

49 

50 

53 

70 


41 

51 

57 

32 

56 

70 

56 

57 
47 
44 

42 

40 

44 

41 

54 

50 

52 

53 
43 
49 

53 

52 

72 

56 

50 

31 

56 

56 

59 

35 

49 

41 

53 

53 

53 

47 

28 

60 

47 

38 

Means , 

S 5‘2 

SS-8 

47-4 

54-8 

44*2 

49-4 

56-6 

47*4 

49-8 

44*0 

Ranges . 

2X 

13 

25 

26 

14 

10 

22 

28 

12 

32 


54 

^27 

39 

52 

49 

57 

28 

48 

36 

48 


27 

46 

S8 

65 

49 

38 

51 

42 

48 

47 


48 

60 

66 

60 

1 44 

55 

49 

46 

55 

42 


41 

62 

59 

41 

49 

56 

50 

62 

53 

i 69 


47 

48 

63 

62 

64 

47 

53 

59 

47 

46 

Means . 


48*6 

RH 


51-0 

SO-6 

46*2 

51*4 

47-8 

50-4 

Ranges , 


35 

H 


20 

19 

25 

20 

19 

27 


to collect the data in groups of 5 or 10 (say), the mean range can be 
found quite quickly, and when divided by the appropriate constant 
obtained from the tables, can be converted to the standard deviation. 
The ratio does not seem to be very sensitive to moderate changes 
in the form of the distribution, and this method may be used in most 
practical experience when the frequency distribution is uni-modal 
and tails off fairly gradually to zero at die extremes. Because of its 
ease of computation, the range is convenient in routine work. In 
Table 1.3 there are 100 individuals from an artificially constructed 
“normal” population, and these are arranged in random groups of 5. 
The ranges are given, and their mean is 22-3. The constant given in 
■ Table 1.2 for samples of 5 is 2 • 326, whence we obtain for an estimate 
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of the standard deviation a value 22 *3/2 -326 == 9-6; this is not far 
from the true standard deviation, which happens to be 10 units. 

This method of obtaining the standard deviation is only equivalent 
to the other method when the individuals are mixed and divided 
into sub-samples at random. 

All these measures of dispersion may be rather bewildering to the 
reader, but they are simply so many alternatives, which are exactly 
related for any given form of frequency distribution. In view of the 
theory of the next chapter and section 3.7, it will be well to regard 
the standard deviation as the fundamental measure of scatter, and 
the other constants as approximations to it. It is sometimes thought 
that these measures of variability apply only to distributions of the 
“normal'’ type, but this is not so; they are equally applicable to 
skew distributions, and, provided the shape is the same, may be 
'lised to compare one distribution with another. 

A qualitative appreciation of dispersion or variation may be 
acquired with little experience, and for comparative purposes any 
of the above measures may be appreciated fairly easily; the more 
precise interpretation of the standard deviation in terms of frequencies 
is dealt with in section 2.53. 

Shape 

1 . 24 . The shape of a uni-modal frequency distribution may vary 
in two ways, in the degree of asymmetry, or in the flatness of the 
mode. This flatness of the mode (or kurtosis) is different from that 
flatness of the curve as a whole which arises from the dispersion, 
and is illustrated in Fig. 2, where there are several curves having 
the same standard deviation but varying kurtosis. These properties 
may be measured by constants derived from the third and fourth 
moments of the distribution. The third moment, 

S{x ^ xf 

and the fourth moment, 

S{x — x)^ 

M4- ’ 

where the symbols on the right-hand side of the equations have the 
samft meaning as those used in defining the second moment. K. Pear- 
son has derived two constants from the moments, which are inde- 

c 
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pendent of the dispersion 
They are 


of the distribution, and describe its shape. 


^ 2 * 


3 and ^2 = -2, 




is zero if the distribution is symmetrical, while is usually about 
3 for a curve like that of Fig. i.i (i), is smaller if the mode is flatter, 
and larger if the curve is more sharply peaked. In Fig. z are given 
a few smooth frequency curves with different values of and jSg. 
They all have positive skewness, and a similar series with negative 
skewness can be imagined. 

The degree of skewness may be more simply measured by the 
quantity 

Mean — Mode 
ewness Standard Deviation 


This quantity is negative if the curve shows a bias to the right 
(assuming that ascending values of the variate are taken from left 
to right) and has the longer tail to the left. The position of the 
mode, as we have shown, is not easy to define, but if the curve comes 
within Pearson’s* system, the following formula may be used : 


Skewness = 


Vm, + 3 ) 

2 ( 5^2 - 6^1 - 9 ) 


in which V is given the same sign as p.3. From this we obtain the 
equation for fixing the mode of a distribution relative to its mean, 


Mean — Mode = 


oVKffiz + 3) 
2(5^2 - 6^1 - 9) 


These constants of shape are rather difficult to interpret, and it is 
only after considerable experience that their practical import can 
be appreciated. 


Computation of Moments 
Moments from Ungrouped Data 

1 . 31 . It is always possible to compute the mean and the higher 
moments in a sample by straightforward evaluation of the expres- 
sions given in sections i. 22-1 .24 as definitions. For example, the 

• This will be mentioned in section 3.6. 
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mean, and second moment are calculated directly in the first three 
columns of Table 1.4 for a small sample of ten individuals. We have 



not yet dealt with the theory of methods for such small samples, 
and this one is given here merely to illustrate arithmetical processes 
that are applicable to small and large samples. The sums of the 
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columns are given at the foot of the table, and from them it is seen 
that the mean, ;» = 71 -8 and the second moment ~ 12-96. 

Such calculations are much facilitated by making a transformation 
of the values of the variate, measuring them as deviations from 
some arbitrary origin or ‘Vorking mean” and dividing the deviations 
by an arbitrary constant; this last step is equivalent to representing 


TABLE 1.4 


X 

ix-li) 


it' 


67 

- 4*8 

23*04 

— I 

I 

70 

^ 1-8 

3*^4 

0 

0 

76 

+ 4'2 

17-64 

+ 2 

4 

73 

+ 1*2 

I ‘44 

+ I 

I 

67 

- 4-8 

23*04 

— X 

I 

73 

+ 1*2 

I ‘44 

+ I 

I 

70 

-- 1-8 

3-24 

0 

0 

73 

+ 1-2 

1-44 

+ I 

I 

70 

- 1*8 

3*^4 

0 

0 

79 

+ 7.2 

51*84 1 

+ 3 

9 

718 

0*0 

129*60 

+ 6 

18 


the variation on an arbitrary scale. If x' is the transformed variate, 
X the arbitrary origin, and h the arbitrary constant or scale. 


x-X 

h 


and x = X-\-hx' . . . (i.i) 


Further, let the mean of the values of x' be x', and the mean of 
their rth power be /x.', so that 


Sx' , Sx^ i 

X = and = 


N 


N 


* The letter [i means that the constant is the mean of some power of 
deviations, the subscript i denotes the order of the power (e.g. for the 
second moment, 5 = 2) and the prime indicates that the deviations are from 
an arbitrary origin in arbitrary units. Consistent with this notation x' = 
Where there is no prime, the deviations are measured from the mean in the 
units of the variate, and the fis is the sth moment. 
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Then it may be shown that : 

^ = X + hx', 

/ig = — ^'2), 

+ 6/x^i2 _ 2^'4). 

To illustrate the use of these equations the mean and second moment 
have been calculated for the data of Table 1.4. The arbitrary origin 
X = 70, the constant A = 3, and the values of x' are given in the 
fourth column. From their sum, x' = o-6 and 

= 70 + 3 X 0*6 == 71*8. 

The sum of the values of given in the fifth column, gives 
^' == 1*8, and from equation (i.a), 

jLtg = 9(1*8 — 0-36) == 12-96. 

These results agree with those calculated directly, as indeed they 
should. Readers are recommended to calculate the third and fourth 
moments for Table 1.4 by the two methods. 

1 . 311 . The proof of equations (1.2) depends on applying the 
ordinary rules of addition to summations denoted by the sign 
The first is that the summation of a compound term itself made 
up of the sum of several terms is equal to the sum of the summations 
of the separate terms. For example, 

S{X + hx^) = SX + Shx\ 

The second rule is that the summation of a term that is 
multiplied by a factor constant for all values of the term summed, 
is equal to that constant multiplied by the summation of the term. 
For example, 

Shx' == hSx\ 


The general significance of these rules should be appreciated. 

From equation (i.i) and the above examples of the summation 

^ Readers will do well to master these rules for the sake of following later 
chapters. 
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rules the first part of equation (1.2) follows easily. It also follows 
that 

X — X = h{sf — x'). 


and 


/*5 = 


S{x - xy __ Sh%x' - xj 
iV " N ' 


Taking the constant term outside the S sign and expanding the 
term in brackets by the binomial theorem, we find that 

+ . . . 

(- + (- 

Summing this term by term, and remembering that x' = and 
is constant for all individuals in the sample, we combine the last 
two terms, and find that 

— s/tj-i/Xi + +...(— ly-^s — 

If s is successively put equal to a, 3 and 4, the last three of equa- 
tions (1.2) result. 


Moments from Grouped Data 

1 , 32 - When calculating the moments of a frequency distribution, 
it is usual to assume that all the individuals in any one group have 
the central value of that group, i.e. the value midway between the 
limits of the sub-range. For Table i.i (i) the central values are 
51*5, 52*5, etc., inches. These values may be transformed as shown 
in section 1.3 1. Then to calculate the moments, instead of summing 
over all individuals, it is possible to multiply the appropriate power 
of the transformed group value by the frequency and to ^ sum these 
products over all groups. Let xl be the ^th group value (the subscript 
denotes that the transformed variate can only take discrete values 
corresponding to the central values of the groups), the frequency 
in that group, and S be the summation over all groups, then for 
such data, ^ 

CA/S Sf^pC^ 

Sn,^N and = 


• • ( 1 - 3 ) 
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The letter v' is used for the mean of a power of a transformed 
variate calculated from grouped data where ju,' would be used if the 
data were ungrouped. 

For example, the data in Table 1.4 may be grouped and the sum 
of the fourth column is 

2X(— i) + 3Xo + 3Xi + iX2+iX3. 

Using equations (1.3) and applying equations (1.2) to the resulting 
means as shown above, it is possible to compute the crude mean 
and moments from grouped data. These moments are denoted by 
j/g, J'a and For a continuous variate some of the crude moments v 
differ slightly from the true moments denoted by jic, even when 
calculated from very large (infinite) samples, because the discon- 
tinuous form of a frequency distribution with finite groups is only 
an approximation to the continuous form that is possible with a 
continuous variate. Sheppard’s corrections, when applied to such 
crude moments calculated from a table with uniform sub-ranges, 
give values that approximate more closely to the true moments. 
The corrected mean and moments so obtained are : 

X = crude mean^ 

F2 = Vz — 

^3 = *' 3 . 

and jU'd = *'4 — 

The above transformations make the computation of moments 
from grouped data a comparatively easy matter if the arbitrary 
origin X is chosen to be the central value of one of the groups 
near the centre of the whole distribution and h is made equal to 
the sub-ranges, so that the values of the transformed variate jc/ are 
o, I, 2, 3 ... — I, ~ 2, — 3 ... etc. The corrections (1.4) can 
then be applied to the arbitrary moments, putting h equal to i, and 
the final true moments be found by multiplying by h, and h^; 
the j 3 coefficients, being ratios, are found directly. 

The whole process is followed in the example of Table 1.5, the 
data being the heights of i 078 fathers (Pearson and Lee, 1903). 
Column (i) contains the sub-ranges, and colunm (2) the values of 
the centres of the sub-ranges in the arbitrary units, Column (3) 
contains the frequencies (the 0*5 frequencies arise because when 
an observation falls exactly on the border-line between two groups, 
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TABLE 1.5 


( 1 ) 

(S) 

( 8 ) 

( 4 ) 

( 6 ) 

( 6 ) 

stature in 
Inches 

Arbitrary 

Units 

Frequency 

fit 

«(a 4 



58 - 5 - 

- 9 

3 

- 27 

243 

— 2 187 

59 - 5 - 

^ 8 i 

3*5 

— 28 

224 

- 1 792 

60 - 5 - 

7 ! 

S 

- 56 

392 

- 2 744 

61 -s- 

^ 6 

17 

— 102 

612 

— 3 672 

62*5- 

- 5 

33-5 

— 167-5 

837-5 

- 4187-5 

63 - 5 - 

- 4 

6 i*s 

— 246 

984 

— 3 936 

64 - 5 - 

- 3 

95*5 

— 286-5 

859-5 

- 2578-5 

65 - 5 - 

~ 2 

142 

— 284 

568 

— 1 136 

66 - 5 - 

— I 

i 37‘5 

- 137*5 

137-5 

- 137-5 

67 - 5 - 

0 

154 

— 

— 

— 

68 - 5 - 

I 

141*5 

141*5 

141-5 

141-5 

69 - 5 - 

2 

116 

232 

464 

928 

70 - 5 - 

3 

78 

234 

702 

2 106 

71 - 5 - 

4 

49 

196 

784 

3 136 

72 - 5 - 

5 

28*5 

142-5 

712*5 

3 562-5 

73-5- 

6 

4 

j 24 

144 

864 

74-5- 

7 

5*5 

38-5 

269*5 

1886-5 

Total 


I 078 

— 326 

8075 

- 9 746 


yj . — ~ 326/1 078 = — o • 302 41 

V2 = 8 075/1 078 = 7-490 72 

— v'si = — 0-091 45 


*'2= 7'399 37 

— 0-083 33 


^ 2 = 7-31594 

(7= 2-7048 inches 

I'j = — 9 746/1 078 = — 9-040 8 
3 »' 2 ’'i =■ + 6-795 8 


— 2-2450 
+ 2vi® = — 0-055 3 


Vg = = _ 2-3003 

S-=!f—)a‘OT‘3 ei 
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TABLE 1.5 — continued 


( 7 ) 

1 (8) 

( 9 ) 

^ (10) 

(U) 

(IS) 

( 13 ) 

ntxi* 

II 



nt 

(expected) 

z 

II 

19 683 

14 336 

19 208 

22 032 

20 937-5 

15 744 

7 735-5 

2 272 

137-5 

141-5 

1856 

6318 

12 544 

17 812-5 

5 184 

13 205-5 

— 3-030 8 
—2*661 0 
—2*291 3 
— 1*921 6 
-I-S 5 I 9 
-1*182 2 
— o*8i2 5 
—0*442 8 
—0*073 I 
0*296 7 
0*666 4 
1*036 I ! 
1*405 8 
1*775 5 
2*145 2 
2*5149 
2 * 884 6 

0*001 22 
0*003 90 
0*010 97 
0*02733 
0*060 34 
o*ii 8 56 
o*2o8 25 
0*328 96 
0*470 86 
o* 6 i 6 65 
0*74742 

0 * 849 92 
0*920 10 

0 * 962 09 
0*984 03 
0*994 04 
0*998 04 

1*3 

4*2 

11*8 

29-5 

65*0 
127*8 
- 224*5 
354*6 

507*6 

664*7 

805*7 

916*2 

991*9 

I 037*1 

I 060 * 8 

I 071*6 

1075*9 

1*3 

2*9 

7*6 

17*7 

35*5 

62*8 

96*7 

130*1 

153*0 

157*1 

141 *0 
110*5 

ns'i 

45*2 

23*7 
^ 10*8 

J 6*4 

0*004 0 
0*011 6 
0*028 9 
0*063 0 
0*119 7 
0*198 3 
0*286 8 
0*361 7 
0*397 9 

1 0*381 8 

1 0*3x95 

0*233 2 
0*1485 
0/*o82 5 

0 * 040 0 
0*0x6 9 
o*oo6 2 

1*6 

4*6 

11*5 

i 2 S*X 

1 47*7 

79*0 
114*3 

X 44*2 

158*6 

152*2 

i 127*3 
92*9 
59*2 

32*9 

15*9 

6*7 

2*5 

179 147 

— 

i 

— 

X 078*0 

— 



— 179 147/1 078 = 166*185 

-4V3J/(=— 10*936 


+ 6 v '^-^ = 


155*249 

+ 

4* 1x0 

- 3 = 


159*359 

— 

0*025 

Vi = 


159-334 

- 4V2 = 

— 

3-700 



155-634 


+ 

0*029 


155-663 

2*90^3 

mean — mode = — 0*170 i 

mean = 68*0 — 0*302 4 = 67*697 6 inches, 

mode =; 67 * 867 7 inches 




42 METHODS OF STATISTICS 

a half is put into each), and columns (4), (5), (6) and (7) are obtained 
successively from the previous one by multiplying by the corre- 
sponding units in column (2). In column (4) — 27 = 3 X — 9, in 
column (5) 243 = ~ 27 X — 9, in column (6) — 2 187 = 243 X — 9 
and in column (7) 19 683 = — 2 187 X — 9, and so on for the other 
rows. The arithmetic may be checked by multiplying each frequency 
directly by and so checking column (7) term by term ; if that 
is correct the other columns from which that was obtained must 
almost certainly be right. The sums of these columns (paying regard 
to sign) give the v coefficients,* and these are corrected step by step 
below the table. The resulting moments are in inch-units already, 
since the sub-groups are i inch wide. The mean — mode is found 
from the formula on p. 34; r ' = — 0*302 4, and so the mean is 
0*3024 inch less than the arbitrary origin, which is 68 -o inches 
(the centre of the group at which = o). Hence mean height 
= 68*0 0*3024 = 67*6976 inches. We shall refer to columns 

(8) to (13) of Table 1.5 in the next chapter. 

In this example, the constants have been calculated correct to 
several decimal places. This has been done advisedly, for when 
complicated computations are performed, errors due to “dropping 
figures’^ too soon are apt to accumulate and become very large, and 
it is always well to be on the safe side. The number of figures used 
in computing this example is near the minimum advisable. The 
final result may be “rounded off” to a few figures, if it is not going 
to be used in further calculations. 



CHAPTER 11 


DISTRIBUTIONS DERIVED FROM THEORY OF 
PROBABILITY 


Probability 

2.1. There are events about which our knowledge is so complete 
that we are able to predict with complete certainty whether or not 
they will occur, and on the other hand there are events about which 
we know nothing, so that we are unable to make any prediction. 
An event of the first kind is the falling of a penny tossed into the 
air — ^we are certain it will fall — and one of the second kind is the 
falling of the penny with the head (say) uppermost. Between these 
two extremes there are events about which we know something, but 
not enough to allow of certain prediction; these are the province 
of the theory of probability. The extent to which we can rely on a 
predictioiji varies in degree for different events depending on the 
amount of knowledge we have of the factors that determine the event, 
and a measure of the degree is called the probability of the event. 

Probability is expressed on a somewhat arbitrary scale of numbers 
between unity and zero, a value of i or o corresponding to certainty 
that the event will or will not occur, and intermediate values to 
intermediate degrees of certainty; a probability of 0*5 means that 
the event is as likely to occur as not. Defined in this vague way, 
there seems to be little objectivity about probability as a measure, 
but precision is achieved by giving the word a special meaning and 
concreteness by ‘applying it to a restricted field.* These two steps 
result in mathematical and statistical views of probability. By 
extending the conceptions of these special fields to the events of 
ordinary life, the probability of any event is usually express*ed as 
chances for or against the event, and the analogy makes the idea of 
probability fairly definite and concrete. We shall see how this arises 
in the next two sections. 

* A general calculus of probabilities has been developed for application 
to the general problem of expressing the relations between events and the 
amount of knowledge of the determining factors, or between propositions 
and the data on which they are based. The application of this calculus has 
not received universal acceptance, but this need not concern us, as we are 
only interested here in the mathematical and statistical applications that are 
almost, if not quite, universally accepted. 
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Mathematical Probability 

21h This view of probability is based on a conception of a number 
of equally likely chances, some of which are favourable to the 
occurrence of the event and some of which are unfavourable. The 
ratio of the favourable to the total chances is the probability of the 
event. This leads to a similar scale to that mentioned above, varying 
between zero and unity. It is usual and helpful to imagine an 
experiment like that of throwing a perfect six-sided die, all sides of 
which are equally likely to turn up. Only one side has a six, so the 
probability of turning up a six is Other idealised games of chance 
suggest other typical experiments: — drawing blindfold from a bag 
of well-mixed balls that differ only in colour, or drawing cards from 
a well-shuffled pack, or spinning a perfectly balanced roulette wheel. 

The calculus of mathematical probabilities is purely concerned 
with finding the probability of composite events knowing those of 
the simple components, and as such it is an exercise in permutations 
and combinations, enumerating the chances for and against the 
composite event. There are two fundamental rules that should be 
understood. 

Rule L — The probability of occurrence of one or other of a 
number of independent events, only one of which can occur at a time, 
is the sum of the probabilities of the separate events. For example, 
it is easy to see that when throwing a die, two of the six equally 
likely chances favour the turning up of either a five or a six and 
that the probability is I + i 

Rule IL — The probability of the simultaneous occurrence of a 
number of independent events is the product of their separate proba-* 
bilities. Thus, when throwing two dice, there are 36 equally likely 
possibilities, and of these, only one is favourable to a double six; 
the probability of a double six is and this is J X 

The word independent has its ordinary English meaning in the 
above rules. 

Sometimes the chances for and against an event are specified 
instead of the probability ; a probability + n) may be described 

by the statement *"the chances are m to n for the event.’’ 

Statistical Probability 

2*12. The mathematical laws of probability find much use in the 
theory of statistics for calculating the relations between samples and 
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populations. Superj&cially, and from the statistical standpoint the 
drawing of a sample of (say) i 000 men from the population of 
England is very like drawing balls from a bag, and it is assumed 
that the technique of sampling may be made to satisfy the essential 
conditions for the application of the laws of probability. Then, the 
occurrence of an event becomes the drawing of an individual of a 
given character, the set of equally likely chances becomes a popu- 
lation of equally likely individuals, and a given combination of 
events becomes a sample of given composition. Samples which 
follow the laws of mathematical probability are^ called random 
samples \ they satisfy the essential condition that every individual 
must be independent; i.e. its character must not be influenced in 
any way by the character of any other individual in the sample. 

On this, the statistical view, the probability of drawing an indi- 
vidual of a given character is the proportion of individuals in the 
population having that character; statistical probability is a propor- 
tionate frequency. Conversely, any probability statement may be 
interpreted in those terms. When we say that the probability of a 
given horse winning a given race is one-fifth, we may imagine a 
very large number or infinite population of races run under the 
same conditions, in one-fifth of which races the given horse wins. 

This interpretation of probability is a little more concrete and so 
is a little more satisfying practically than its most general description 
as a ratio expressing a degree of confidence or knowledge, or a ratio 
of chances, but it is still not quite satisfactory. A population is 
rarely known, even when it is such a concrete thing as a bulk of 
corn that is being sampled, for if it were known, there would be no 
question of sampling.^ However, it has been found in statistical 
experience that small samples from the same bulk vary among 
themselves considerably, but that as they increase in size. they 
become more stable and are less subject to sporadic variations. It 
is assumed that this tendency continues indefinitely as the size of 
the sample increases, that there is a single limiting form to which 
a sample from a given population tends as the sample increases in 
size. This limiting sample is what the statistician ‘conceives of as the 
infinite population^ and in more ordinary language may be described 
as the result that would be obtained in the “long-run” of experience. 
The probability of a given kind of individual is the proportion of 

^ We are only dealing with instances in which the population is very 
large compared with any samples that are likely to be drawn. 
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individuals of that kind in this infinite population. We do expect, 
in the long-run, that events having given probabilities will occur and 
fail in nearly the relative proportions specified by the probabilities. 

The above postulate of the stability of proportionate frequencies 
in infinite samples cannot be proved, for owing to the limitations 
of human patience and powers it is impossible to increase the size 
of an actual sample indefinitely, but on it has been based the whole 
theory of statistical sampling, the extensive use of which has failed 
to reveal any inconsistencies that throw doubt on the basic postulate. 
This will be discussed again in section 2 . 4 . 

The infinite population is seen to be distinct from the bulk that 
is being sampled, and is an abstraction in that its physical existence 
cannot be shown; it is dependent on the technique of the sampling 
as well as on the bulk being sampled. Since the interest lies in the 
characteristics of the bulk, it is important to try and arrange the 
sampling technique so that the infinite population and the bulk 
sampled are substantially the same. 

2.13. The foregoing discussion of probability may be summed up 
roughly in the following terms. It is assumed that in the long-run 
of experience, the proportions of occurrences of different kinds of 
chance events will tend to have stable values that define the infinite 
population. The long-run proportion of any one kind of event is 
its probability. For random sampling, the probabilities of composite 
events may be calculated from those of simple events by using the 
laws of mathematical probability. In practice, all probability state- 
ments are interpreted in terms of proportionate frequencies in the 
long-run of experience, even when used to measure a degree of 
subjective confidence that the event will happen. Thus, an event 
that happens frequently in the long-run has a high probability, and 
on any single occasion we have a considerable degree of confidence 
that an event with a high probability will happen. 

We shall now introduce some theoretical frequency distributions 
that are derived from the laws of mathematical probability, and 
describe some of the applications to statistical data. 

Binomial Distribution 

2.2. If we had four perfect dice and threw them together, noting 
the number of sixes that turned up, we should find at different 
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throws, four, three, two, one and zero sixes, and if we made enough 
throws we could form a frequency distribution in which the variate 
was the number of sixes per throw, or alternatively the number of 
“not-sixes,” and the individuals were throws. This is the distribu- 
tion dealt with in this section. 

The imaginary experiment maybe generalised by calling the casting 
of a six the occurrence of an event or a success^ the casting of a number 
other than a six a non-occurrence or failure^ and the throw of four 


TABLE 2.1 


Number of 
Occurrences per 
Set 

Number of 
Non-occurrences 
per Set 

Proportion of Sets 

n 

0 


n— 1 

I 


n — 2 

2 

n(n-i) „ 2 

« ~ 3 

3 

j 

3- 

} 

0 

1 

1 

n 

i 

Total 

. . 



dice a set of n trials (w == 4). Then if the probability of a success is 
p and that of a failure is ?, (p + ? = !)> a set of n independent 
trials the probability of {n — s) successes and j failures is 

* This follows from the rules of mathematical probability given in 
Section 2.1 1 . The probability that {n — s) particular trials will be successes 
and s will be failures is (Rule II). There are 

n{n — i) . . . (n — 5 4- i) 
si 

ways in which (n — s) successes and s failures may occur in a sample 
of n (this is a result from the theory of combinations), so from Rule I, the 
probability that any (n — s) trials will be successes and any 5 will be 
failures is the expression (2.1). 
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The probabilities for different values of s are set out in Table 2.1 
in the foma of a frequency distribution with proportionate fre- 
quencies This is called the binomial frequency distribution because 
the proportionate frequencies are the terms in the expansion of the 
binomial (p + q)”. For example, for the four perfect dice the dis- 
tribution of sixes is the expansion of the binomial (J -f- f)^ and is 
shown in Table 2.2. The variate of this distribution is discrete, and 
it will be noted that all the proportionate frequencies may be 
described in terms of the two constants p (or q) and n. 


TABLE 2.2 


Number of Sixes . . 

4 

3 

2 

I 

0 


Proportion of Throws 

T'^V'g' 

20 

15 0 
TS'ffg’ 

.500' 

T2¥Tr 

0 2 5 


Percentage of Throws 

o-o8 

1*54 

11-57 

38-58 

48-23 



It is a matter of algebra to calculate the moments of the general 
distribution given in Table 2.1 ; they are : 


Mean Number of Occurrences, I = np, 


Second Moment, = npq = /( 1 
whence 

Standard Deviation, 


nj 


nation, <r = V npq = ^ ~ 


Third Moment, /X3 = npq{q — p), = 


npq ’ 


( 2 . 2 ) 


Fourth Moment, ~ npq\i -f- 3(« — 2)pq\, 8 ^ = ~ + —• 

npq n 


It will be noted that the second and fourth moments are sym- 
metrical inp and q, i.e. the result is the same whether the occurrences 
or failures are the variate, and that for the third moment the inter- 
change ofp and q merely alters the sign. 


Poisson Series 

2 . 3 . A binomial distribution noay be imagined in which the proba- 
bility of a failure, q, is very small, that of a success, p, is nearly 
equal to unity, and the number of trials per set, n, is exceedingly 
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large, so that the mean number of failures per set, m = nq, is of 
moderate dimensions. Then for any moderate value, j is negligible 
compared with n and {n — s) may be equated to n. Doing this in 
expression (2.1) the probability of ^ failures is 



-?r= 



nf 

71 


and the limit of this as n approaches infinity is 



( 2 - 3 ) 


where e is the exponential base = 2*718 . . . 

The expansion of this for different values of s is the Poisson LimiV 
to the Binomial or the Poisson Series or the Law of Small Numbers. 

This distribution is defined entirely by the one constant which 
is the mean. The other moments are given by the limits to equa- 
tions (2.2) as n approaches infinity, and of these it is only important 
to note that the second moment, which may be written (i — mjn)m^ 
equals the mean. 

To calculate the terms of the series for a given value of m, either 
the expression (2.3) may be evaluated, with the aid of logarithms, 
for ^ = o, ^ = I, ^ = 2 and so on, or Soper’s tables in Pearson’s 
collection of tables (1931) may be used. These tables give the values 
of expression (2.3) to six decimal places for values of m between 
m == 0*1 and m = 15*0, and within this range, for all values of s 
that have any proportionate frequency. 


Use of the Binomial and Poisson Distributions for Testing 
Randomness 

2 . 4 . One of the most elementary and direct experimental tests of 
the assumption of an infinite population and of the applicability 
of the laws of mathematical probability to actual experience consists 
in seeing if the binomial distribution describes the variations in the 
numbers of successes in sets of trials. Dice have been thrown by 
different experimenters many thousands of times in the aggregate, 
and, the resulting frequency distributions have been compared with 
the corresponding theoretical binomial distributions. Other experi- 
ments have taken the form of tossing coins, and thousands of runs 
of roulette wheels in casinos have been observed for this purpose. 
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It cannot be said, however, that these experiments have either proved 
or disproved the laws of probability, for when there has been a 
discrepancy between theory and experiment, it has usually been 
possible to find reasons why the particular experiment failed and 
the general applicability of the laws of probability has remained 
unquestioned. We believe that these laws are no more to be proved 
or disproved experimentally than are (say) the terms in a Fourier 
analysis of a series. They are statements of mathematical relation- 
ships that are useful in analysing statistical sampling experience, and 
do in fact describe an important element in that experience. The 
justification for the laws is that they are useful in this way. This is 
the attitude in which the analyses of this section are regarded. 

In many collections of statistical data, individuals are of two kinds 


TABLE 2.3 


Number of Seeds 
Germinated per Row 

Frequency of Rows 

Actual 

Expected 

0 

6 

6*9 . 

I 

20 

I9-I 

2 

28 

24*0 

3 : 

12 

17*7 

4 

8 

8-6 

5 

6 

2*9 

6 

— 

0*7 

7 

— 

0*1 

Total . . . . 

1 

80 

00 

0 

6 


and divided into sets ; the binomial distribution may be used to see 
if, for any particular collection, the individuals are independent and 
the two kinds occur at random in the sets. A special experiment to 
illustrate this was conducted by incubating 800 cabbage seeds on 
filter paper in rows of ten, and after eight days the number of 
germinated seeds in each row was counted. These data were formed 
into a frequency distribution in which the variate was the number 
of germinated seeds per row and the individuals were rows ; this is 
given in the second column of Table 2.3. If the germinated seeds ' 

^, 4 . armnrtCF mWR. thi's distributiOU WOUld 
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be expected to be a binomial with n = 10. We do not know the 
probability p of z single seed germinating, but may estimate it 
from the relationship between the mean I, n and p given in 
equations (2.3). The mean number of germinated seeds per row 
is (o.X 6 + I X 20 + . . . 5 X 6 ) 80 = 2*175, and the estimate 

of p is therefore 2*175^ 10 = 0*2175. The proportionate fre- 
quencies of the expected binomial distribution are given by the 
expansion of (0*2175 + 0*7825)^0, and these multiplied by 80 are 
the expected frequencies of Table 2.3. The agreement between the 
actual and expected frequencies is quite good,* and as far as may 
be judged from this limited experience, the variations in germinated 
seeds from row to row were random; the seeds were well mixed 
and independent and conditions were uniform, so that there was a 
constant probability of any one germinating. 

It may be objected that since the expected distribution was 
derived from the actual by choosing^ to make the two means equal, 
the closeness of agreement signifies nothing. This objection is not 
valid, however, since the two distributions have only been made to 
agree in two respects, mean and total frequency, and they have not 
been made to agree in form; that agreement is a consequence of 
randomness. 

Another test of randomness is that the various moments should 
bear the same relations to each other as those of the theoretical 
binomial distribution given in expression (2.2). The most important 
relationship is that between the second moment and mean. The 
second moment or variance of the actual distribution in Table 2.3, 
calculated by the technique of paragraphs 1.31 and 1.32, is i’744, 
and l{i — lln)y which may be termed the expected variance, is 
1*702; again the agreement is good. 

The almost classical example of a Poisson distribution is that giyen 
by counts of yeast cells in the squares of a haemac5rtometer. The 
liquid in which the yeast cells are suspended can be regarded as 
consisting of aggregates of molecules of the liquid about equal in 
size to the yeast cells which are sparsely distributed among them. 
Then the probability that any aggregate taken at random is a yeast 
cell (the aggregate being a trial and the yeast cell a failure in 
the language of the theory) is extremely small; but there are very 
many such aggregates in the liquid under one square in the haema- 

* A criterion for judgihg the closeness of agreement between two 
distributions will be given in Chapter IV. 
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cytometer (i.e. in the set), so that the mean number of yeast cells 
per square is finite. Consequently, if the cells are distributed inde- 
pendently and at random through the suspending liquid, the fre- 
quency distribution of number per square should be the Poisson 
Series. Table 2.4 gives the distribution of the counts in 400 cells 
found by “Student” (1907). The mean number of cells per square 
is 4-68, and from Soper’s tables the Poisson Series having a mean 


TABLE 2.4 


Number of Cells 
per Square 

Frequency of Squares 

Observed 

Expected 

0 



3-71 

I 

20 

17-37 

2 

43 

40-65 

3 

53 

63-41 

4 

86 

74-19 

5 

70 

69-44 

6 

54 

54 - x6 

7 

37 

36-21 

8 

18 

21*18 

9 

10 

11*02 


5 

5 *i 6 

II 

2 

2*19 

12 

2 

0*86 

13 

— 

0*31 

14 

— 

0*10 

15 

— 

0*03 

x6 

— . 

0*01 

— 

400 

400*00 


(m) of the same value is constructed, the separate terms being multi- 
plied by 400 to give the expected frequencies of the above table. 
Thus, for m = 4-6, o-oio 052 of the squares should have zero cells 
and for m= /[.-j this frequency is 0-009095; hence by linear 
interpolation, for m==4-68 it is o-oio 052 — o-8 X o-ooo 957 
= o • 009 286, and this multiplied by 400 gives the expected frequency 
of 3-71. The agreement between the two distributions is quite 
good. The second moment of the observed distribution is 4-46, 
and is nearly equal to the mean. 
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In general, the Poisson distribution would be expected to apply 
where events occur at random and under constant conditions in a 
medium (usually space or time) that may be divided into a number 
of equal zones, provided that relative to the size of the zone the 
medium is continuous (i.e. there is a very large number of elemental 
units of medium per zone) and the events are rare ''points,’^ i.e. 
there is room in each zone for an exceedingly large number of events 
although the actual number in any zone is moderate. Examples are 
the distribution of microscopic and ultra-microscopic particles and 
bacteria in liquids, the numbers of a-particles emitted from radio- 
active substances in intervals of time, and counts of weeds or pests 
in given areas of land in agricultural field trials. 

When the binomial and Poisson distributions fit any given experi- 
mental data, it is inferred that the variations are due to chance and 
cannot be reduced or controlled unless the whole character of the 
data can be altered by introducing some method of selecting indi- 
viduals of a given kind. It does not always happen, however, that 
there is good agreement between theory and experiment. Where 
there are discrepancies, the plain fact is that the incidence of the 
event is not random, but the matter is not often allowed to rest 
there. The investigator usually tries to find the cause of the lack 
of randomness, and this extends beyond the realm of statistics. 
Sometimes the cause may be some fault in sampling technique that 
may be corrected; sometimes the search for the cause may lead to 
a discovery of scientific value. A frequent kind of discrepancy is that 
in which the actual variance is greater than the expected, and this 
is usually interpreted by assuming that conditions are not uniform, 
regarding the probability of the event as varying from one set of 
trials or zone to another. Then the actual distribution is composed 
of several binomial or Poisson distributions superimposed, and^the 
actual variance is equal to the expected variance plus that due to 
the fluctuations in the probabilities. In such instances, the experience 
is not merely dismissed as being non-random, but the variation is 
analysed into two parts, systematic and random. The principles of 
such analyses will be dealt with in Chapter VI. 

There are kinds of discrepancy other than that mentioned, 
and these lead to other analyses, all of which include a random 
element. ' 

Readers who are interested in further examples of the uses of the 
binomial and Poisson distributions are referred to the following 
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references: Cochran (1936), Fisher (19360:), Przyborowski and 
Wilehski (1935), and Tippett (1934). 


Normal Distribution 

2 . 5 . The Normal distribution may be derived mathematically as 
another limiting form of the binomial that is approached as n 
becomes very large, both p and 5 remaining finite. A binomial dis- 
tribution may be represented by a histogram with each group centred 
over the value of the variate corresponding to the number of occur- 
rences per set; there are (m + i) groups and the outline of the 
diagram is of the characteristic stepped form. As n, and hence the 
number of groups, increases, it becomes necessary to reduce the 
scale of the variate to keep the diagram within reasonable dimen- 
sions and so the steps in the outline become smaller. If this process 
continues indefinitely, it may easily be imagined that in the limit 
the steps in the outline coalesce to form a smooth curve ; this is the 
frequency curve of the Normal or Gaussian distribution. Since it 
is a continuous curve, it necessarily has a continuous variate. Its 
equation may be written 

N 

y ~ — e 

'Vzira 

where e is the exponential base = 3*718 38 . . . , a: is the variate, 
and N, m and a are constants. It will be seen later why the equation 
is written in this particular form and the V zn and — are not 
incorporated in the constants. The derivation of this equation from 
the Ijinomial is purely an algebraic process. 

This curve is that given in Fig. 3 for = o, ySg = 3 and in 
Fig. 3. It extends between * = -f 00 and x = ~ co, since only at 
those extremes does y = o, and it is synometrical about an ordinate 
atx — m, i.e. about the ordinate at O in Fig. 3. 

This frequency curve is deduced from a histogram and conse- 
quently areas imder the curve and not heights of ordinates represent 
frequencies. It is therefore appropriate to use the notation and ideas 
of the integral calculus, to imagine about any given value of » an 
elemental sub-range dx, and to regard the area under the curve 
between ordinates drawn at the limits of the sub-range as an element 
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of frequency, df (say). Then, this elemental strip may be regarded 
as a rectangle of height y and ' 

df — ydx == — 7 ===r-e ^ dx . . . . (2.4) 

A/ 277 cr 

It is now possible to find the various moments of (2.4). The 
total area under the curve is the total frequency, and integrating 
(2.4) between the limits of ^ i 00 this is found to be iV. Hence 
N in (2.4) is the total frequency. The other moments may be found 



from equation (1.3), p. 38, substituting df for the frequency in 
the sub-group, and integrating instead of summing. Thus the mean is 

I 

iV xdf=m 

J — CO 

so that the constant m is the mean of the distribution. Similarly 
the other moments may be found, and the ifirst four are: 

mean = m 
= 

/X3 = O, ^3^ = o 

^2 = 3 * 

These expressions for the mean and variance cannot be deduced 
from those given in (2.2) for the binomial distribution by writing 
n = 00, for they would both become infinite; this is balanced in 
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the derivation of the curve by the ultimate reduction in the scale 
of the variate, referred to above. The ratios and jSg, however, are 
pure numbers, and by putting n = oo in (z.z) they reduce to the 
values given above for the normal curve. 


Determination of Normal Frequencies 

2 . 51 . Apart from the total frequency, N, any given normal distri- 
bution is completely characterised by the two constants m and a. 
We are interested in calculating proportionate frequencies between 
various limits of x. This could be done by integrating (24) between 
the limits if it were possible to evaluate the integral, but as it is 
not, ordinates must be calculated, and the integration performed by 
quadrature or by some graphical means. 

For example, in Fig. 3 the proportionate frequency having values 
less than a value denoted by G on the variate scale is the area under 
the curve to the left of GH^ expressed as a fraction of the total area 
under the curve ; the proportion between G and E is the area between 
GH and EF ; the proportion greater than E is the area to the right 
of EF, This process of integration by quadrature is possible, but 
laborious. However, tables have been calculated, and these reduce 
the labour considerably. 

A complete set of tables for a range of values of m and o would 
be enormous, but by performing a mathematical transformation all 
normal curves reduce to a standard form. The transformation con- 
sists in measuring the variate as a deviation from the mean, divided 
by the standard deviation. If we write 


w = 



• • (2-S) 


then for a sample of i individual (2.4) becomes 

df === .... (2.6) 

V Z7T 

This may be termed the standardised form of the normal curve, 
and has a mean of 2ero and a standard deviation of unity. The 
probability integral at w is the area under the curve to the left of 
an ordinate drawn at w and may be written 
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Very complete tables of and ^ for positive equally spaced values 
of za have been calculated by Sheppard and are included in Tables 
for Statisticians and Biometricians ^ VoL I (Pearson, 1931), where 
our w is called x and our is called |(i + a).^ To find the 
probability integral for a negative value of use is made of the 
fact that the distribution (2.6) is symmetrical about an ordinate at 
w = o. If A__^ is the integral zt — w and A^ is the integral at Wy 
it is easy to see that 

A — jg, = I •••••• (^*7} 


There are also tables that give values of w for equally spaced values 
of A^ while others give w for equally spaced values of 2(1 — Af), 
If OE in Fig. 3 represents a deviation + w and OG a deviation — w 
(OE == OG and since O is at the centre of symmetry it represents 

= 0) the area to the left of EF is A^ and the “tail’’ to the right 
is (i — A^), Similarly, by symmetry the area of the ‘^taiP’ to the 
left of .HG is (i — Ajti) and so 2(1 — is the sum of the two 
“tails” beyond the deviations + w and ~ w. The probability 
integral is sometimes tabulated in this form because it is of interest 
in sampling theory; this is the method of presentation used by 
Fisher (1936^). 

To find the probability integral for any value of an actual variate 
{x in our notation) it is transformed to w by equation (2.5) and the 
corresponding integral is that required. Thus, if 67*697 6 
inches and (7 = 2*7048 inches and it is required to find the 
probability integral at = 60*5 inches say, then 


w 


6 o *5 — 67*697 6 
2*7048 


= — 2*66i o 


From Sheppard’s tables the value of the integral at a deviation of 
2*66 is 0*996 09 and at 2*67 it is 0*996 21 ; so the first difference is 
o • 000 12, and by linear interpolation the integral for zy = + ^ * 661 o is 


A^ = 0*99609 + 0*000 12 X 0*10 = 0*996 10. 

Hence, from equation (2.7), the integral for «? = — 2*6610 is 
0*00390. 

The proportionate frequency between any two values of the 
variate may be found by taking the difference of the two corre- 

* In this book we shall continue to use our own notation, even when 
referring to these tables. 
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spending integrals. Thus, in the above example the integral 
corresponding to = 59’5 found to be O'OOi 22 and the pro- 
portionate frequency between 59*5 60*5 inches is 0'003 90 

— o-oor 22 = 0-002 68. 

A normal frequency distribution may be “fitted” to an actual 
distribution by putting m and o equal to the computed mean and 
standard deviation respectively and then finding the frequencies in 
the sub-groups from the normal probability integrals. The propor- 
tionate frequency calculated in the above example is for the second 
group of the distribution of heights of fathers in Table 1.5 and 
has been calculated in this way; the process is completed in the 
later columns of that table. Column (8) gives values of w corre- 
sponding to the limits of the sub-ranges, column (9) gives proba- 
bility integrals, in column (10) these are converted to frequencies 
by multiplying by the total N —i 078, and the normal or “expected”- 
frequencies in colunan (ii) are the differences of the values in the 
previous column. These may be compared with the actual fre- 
quencies, «(, in column (3). In order to plot the curve we must 
find the ordinates. Sheppard’s tables give values of z, the ordinates 
of the standardised curve corresponding to the deviations w (see 
equation 2.6), and these are given in column (12) of Table 1.5. The 
ordinates of the actual curve are obtained by the transformation 

zN . 

y = — = 398-55 

a 

and these are in column (13). In Fig. 4 the curve drawn from these 
ordinates is superimposed on the histogram. 


Sometimes a frequency or proportionate frequency is given, and it 
is desired to find the corresponding value of the variate. The value 
of the standardised variate zv may be found from Sheppard’s tables, 
and hence, knowing m and cr, the actual variate be calculated from 
equation (2.5). 

For example, suppose it is required to know for the data of 
Table 1.5 the limit of height such that 20 per cent of the fathers 
are shorter than the limit and 80 per cent are higher, assuming the 
distribution to be normal with mean and standard deviation equal 
to the values already computed. Then -4 = 0-2, and since this is , 
less than 0-5 it corresponds to a negative value -of w; we must 
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therefore first find w corresponding to^ = i— o-2 = o*8. From 
Sheppard’s tables, the value of the variate at which A = 0*799 54^ 
is 0*84 and the value at which A = 0*802 338 is 0*85, Hence, by 
linear interpolation, the value at which -4 = o*8 is 


0*8 - 0*799 546 
0*802 338 — 0*799 546 


X 0*01 + 0*84 = 0*841 63. 


Hence the value of w at which A — o*z is — 0*841 63 and if this 
is substituted in (2.5) the limit of height is found to be 65 * 421 inches. 



Relation between Normal Frequencies and Standard Deviation 
2 . 52 * We cannot here give a full table of the probability integral 
of the normal distribution, but a few important values are given in 
Table 2.5. The first entry in this table is obvious; the ordinate at 
w = o is the axis of symmetry of the curve, and the area up to this 
ordinate is half the total area. 

The second entry of the variate is the quartile deviation; the 



6o METHODS OF STATISTICS 

proportionate frequency between positive and negative values of 
this deviation is half the total. The ordinates GH mdEF drawn in 
Fig. 3 are at values of the variate corresponding to == + and 
-- 2*0, and the proportionate area of the two ‘‘tails’’ beyond these 
limits is 0*04550; i.e. about 5 per cent of the individuals in a 
normal population deviate from the mean by twice the standard 
deviation or more. Similarly, from the last entry in Table 2.5 we 
see that 99-73 per cent of the individuals are contained within a 
total range of six times the standard deviation. These data will 
assist readers in appreciating the significance of the standard 
deviation as a measure of variability when applied to a distribution 
that is approximately normal. 

The proportional relations between the standard deviation and 


TABLE 2.5 
Normal Distribution 


Variate 

w 

Probability Integral 

w 

Sum of Two “Tails” 

0 

0 • 500 00 

I '000 00 

0-67449 

0 ■ 750 00 

0 * 500 00 

I *0 

0-841 34 

0*31731 

2*0 

0-977 25 

0*045 50 

2*6 

0-995 34 

0*00932 

3*0 

0-998 65 

0*00270 


other measures of dispersion given above in section 1.23 are true 
only for normal distributions, although they are roughly applicable 
also to many distributions that are approximately normal. 

Practical Applicability of the Normal Distribution 
S. 53 . The Normal distribution is a continuous curve, and first we 
must discuss the applicability of frequency curves in general to 
practical data. 

It is assumed that for most infinite populations with a continuous 
variate, the frequency distribution may be represented by a con- 
tinuous frequency curve. This is an extrapolation of the practical 
experience that as the size of the sample is increased, the sub-ranges 
of the distribution may be reduced and the outline of the histogram 
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usually becomes more regular. Where a form of curve can be assumed, 
the parameters may be estimated from a large sample, and a curve 
be fitted to the actual distribution, as has been done for Table 1 . 5 . 

The normal curve has also been deduced mathematically as the 
distribution thaf results from the combination of an infinite number 
of very small random errors, and this fact together with the fact 
that the distribution is a special case of the binomial gives it a 
fundamental status. It is believed by some to represent the devia- 
tions due to experimental errors in measurements of a physical 
constant, and a quantity that is distributed according to this law is 
sometimes held to be subject only to chance causes that cannot be 
controlled. Conversely, where a distribution is skew, it is often 
inferred that superimposed on the chance variations are some larger 
ones due to a few important causes that may be controlled. We are 
dubious of this use of the normal distribution. Doubtless normality 
may be made a definition, and hence a test, of ‘‘randomness’’ just 
in the way that conformity to the binomial and Poisson distributions 
has been interpreted, but it is doubtful if for the normal curve such 
a course has any practical significance. Frequently, when man has 
done all he can to control the variation in a character and the 
remaining variations are practically random, the resulting distribu- 
tion is far from normal. Experimental errors in physical measure- 
ments are by no means always normally distributed, and some 
quantities like the strength of elements of various natural and 
manufactured materials have essentially skew frequency distribu- 
tions. It may also happen that the distribution of a quantity is to 
all intents and purposes normal, and yet a further degree of control 
may be possible. On the whole, we doubt if the normal distribution 
is of more than empirical value ; it is a convenient means of describing 
any data it happens to fit. 

In presenting this view of the place occupied by the normal curve 
in the statistical scheme, it would be a mistake to underestimate its 
importance. The curve does in fact represent closely a very large 
number of experimental distributions and approximately represents 
many more. It is important, too, because it is the distribution on 
which most of the statistical theory of errors is based. 

Non-normal Curves 

2.6, There are several systems of smooth curves for describing 
data which do not follow the normal law, of which Pearson’s is 
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probably the most useful ; all the curves in Fig. 2 are of this system. 
The type is decided upon from the values of the two j8 ratios, and 
then these, together with the standard deviation and mean, are used 
to find the unknown parameters in the equation. It is not often 
in practical work, however, that any useful purpose is served by 
fitting such curves (their chief use is in sampling theory and in 
actuarial work), so we will merely refer the interested reader to some 
such book as Elderton’s Frequency Curves and Correlation. 

Sampling Distributions 

2 . 7 . A sampling distribution results if a large number of finite 
random samples are taken from an infinite population, some single 


TABLE 2.6 

Frequency Distributions 



27- 

30- 

33- 

36- 

39- 

42- 

45- 

48- 

51- 

Individual observations , . 
Means of 5 

B 

i 

1 

■ 

1 

7 

3 






54- 

57- 

6o~ 

63- 

66- 

69- 

72- 

Total 

13 

5 

8 

I 

7 

3 

I 

3 

I 

100 

20 


characteristic of each sample is computed and the computed values 
are fprmed into a frequency distribution. For example, the observa- 
tions of Table 1.3 may be regarded as forming twenty samples of 
five observations from one population. The means of these samples 
are given in Table 1.3, and are formed into a frequency distribution 
in Table 2.6. This table also gives the distribution of the individual 
observations. 

The distribution of means is called sampling distribution of the 
mean. Like any other frequency distribution it has a mean, and a 
standard deviation. The standard deviation of such a distribution 
is called the standard error of the mean. 

Other constants of the samples of five could have been calculated. 
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for example the standard deviations or the medians. Each of these 
would have had its own sampling distribution and standard error. 

Most sampling distributions in common use may be deduced 
mathematically by applying the laws of probability, but given 
sufficient time and energy one could determine them experimentally 
by making up an artificial population (say) by writing numbers on 
cards, shaking up in a bag, drawing samples and calculating the 
means or other statistical constants much in the way in which 
Table 2.6 was constructed from Table 1.3, except that the scale of 
the experiment would need to be much larger. This kind of tech- 
nique is actually used when the distribution cannot be deduced 
mathematically. 

Sampling Distribution of Mean 

2 . 71 . We are particularly concerned here with the sampling distri- 
bution of the mean in samples of size N drawn from an infinite 
population in which the individuals are normally distributed. This 
is itself a normal distribution with a mean equal to the mean of the 
individuals in the population and a standard error of 

or 

Vn 

where a is the standard deviation of the individuals. 

It will be noticed that as N increases, the standard error of the 
mean decreases. This is shown in Fig. 5, where there are given the 
sampling distributions of the means of samples of i, 4, 16, 25 
and 100 from a population in which the individuals are normally 
distributed with an unspecified standard deviation. This population 
distribution is that in Fig. 5 for iV = i. For the larger samples, 
there is less dispersion about the population mean. This is the 
statistical demonstration of the common experience that a large 
sample gives a more accurs^te representation of the population than 
a small one; the increase in precision is measured by a reduction 
in standard error. 

The normal probability integral may be used to calculate propor- 
tionate frequencies of samples having means within given limits, 
and Table 2.5 may be used by reading “deviation of sample mean 
from population mean, divided by standard error” for “variate zy.” 
Thus only 0*045 50 or about i in 20 of the possible samples have 
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means that differ from the population value by more than twice the 
standard error. 

Deduction of Sampling Distributions of Mean and Standard Deviation 
2 . 72 . The sampling distributions of both the mean and standard 
deviation may be deduced together. The proof is taken from a 
paper by Irwin (1931). 



Let the population mean be ^ and the other particulars as stated 
above. Also let the variate be x and the values in any one sample 
of iV be *2 . . . *5 . . . Since the population distribution is 
continuous, we cannot state what is the probability of a value x^, 
for an ordinate of the frequency curve drawn at x^ has no area, but 
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the probability of a value lying within an elemental range dx^ about 
a value Xs is 

~==-e ^ dx^, 

V Ztto 

Applying the second rule of probability, the probability of a sample 
having values lying within ranges dx^ dx^ , . . of Xg . . . which 
we may shortly describe as ‘‘the probability of the sample/’ is 

— e ^ <r» dx^dx^ . . . dx^y or 

(27r)7cr^ 

— e ^ cr2 dxidx^ . , . dx^, 

(Z'7r)'2 

Now if X and s are the mean and standard deviation in the sample,* 
according to sections i .33 and i .3 1 , 

+ ^2 + • • • •% == and xf + xl + . . , x^ — Nx^ = Ns^ 

Substituting these in the exponent of the above expression, the 
probability of the sample is 

I „.N sa-Kjg-ba 

— e ^ dx^dx^ . . . dxj^. 

(37r)ra^ 

Many of the possible samples from this population will have a 
mean value x and a standard deviation and to find the probability 
of a sample with these two constants lying within ranges dx and ds 
of X and s we must apply the first probability rule and integrate 
the above expression for all samples having these two constants. 
This is a mathematical step involving a transformation to polar 
co-ordinates in iV-dimensional space and a subsequent integration, 
and leads to the following expression for the probability of a mean 
of X and a standard deviation of s: 

df=Ks^'^^e'' e'^^^dxds 

* For the first time it is necessary to distinguish clearly between the 
population value of a statistical constant and the value estimated from a 
sample. Usually we shall denote population values by Greek letters and the 
sample values by corresponding italic letters. This is why we have used | 
instead of m in ^e equation for the normal distribution. 

Tt 
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where is a constant. This is the probability or proportionate 
frequency distribution of the two statistical constants. Since the 
term containing x does not contain s, and vice-versa, the converse 
of the second rule of probability implies that x and ^ have inde- 
pendent probabilities, and the two distributions may be written 
separately: 


and 


_N 'I 

df==K^e ^ dx\ 


Ns^ 

df = K^s^^^e ^^^ds. 


( 2 . 8 ) 


By integrating the first expression over the whole range and equating 
the result to unity, the constant is found to be 




Vn 

‘S/z'na 


The distribution of the mean is a normal one with mean equal 
to I and standard deviation equal to a/VN, That of the standard 
deviation is more complicated. 



CHAPTER III 


ERRORS OF RANDOM SAMPLING AND STATISTICAL 

INFERENCE 

Much statistical work is concerned with attempts to learn some- 
thing of the characteristics of populations from measurements made 
on finite representative samples. It is a fact of experience that suc- 
cessive samples from one population usually differ amongst them- 
selves, and therefore in general they differ from the population. 
They are subject to the so-called errors of sampling that bring 
uncertainty into any inferences that may be made regarding the 
population. This chapter is concerned with the theory of statistical 
inference, taking account of these errors; the applications of the 
theory are described for large samples only. 

Random and Representative Samples 

3 . 1 . We shall deal only with simple random samples, in which, as 
stated in section 3.12, every individual is independent of every 
other, A sample of i 000 men consisting of 500 pairs of brothers is 
not random, for there is a tendency for brothers to be alike and 
there are only 500 independent individuals ; the others are related, 
statistically as well as by blood, to the first 500. Special precautions 
are necessary in practice to obtain satisfactory samples; bales of 
cotton, sacks of corn and the like are seldom uniform, and a small 
quantity taken from one place is not representative of the whole. 
In such circumstances, it is necessary to select the individuals one 
at a time, but where the bulk is well mixed and statistically homo- 
geneous, a group of individuals taken from any part is effectively a 
random sample. 

The theory of sampling describes the relations between samples 
and the infinite population, and this last will differ from the actual 
population or bulk if the sample is biased. In practice, it is the 
bulk that is of interest, and it is desirable to mal^e the sample a 
representative one by seeing that every individual in the bulk has 
an equal chance of being included. This is not always easy. For 
example, when taking cotton hairs singly from a bale, there is an un- 
avoidable tendency to take too many of the longer hairs. The method 
of drawing individuals must be independent of their character. 
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The degree of elaboration necessary in a sampling technique to 
ensure randomness and representativeness depends on the nature 
of the material being sampled, and the arrangement of a technique 
is more a matter for the experimentalist than for the statistician. 
Readers are advised, however, that investigation in particular fields 
has often shown that much more care in sampling is required than 
appeared to be necessary at first sight. Although it is impossible to 
give general rules, the following description of two particular 
sampling methods may be useful as illustrations. 

The first method is that applied to sampling cotton hairs. Hairs 
are taken from different parts of the bale, not singly but in tufts 
sufficiently large to avoid the bias towards length. Each tuft is then 
divided roughly into two halves and one half is discarded. The 
other is halved again, one half is discarded and the other is again 
divided. This is repeated until the final reduced tuft contains only 
a few hairs. A number of such reduced tufts is combined to form 
the final sample, every hair of which is measured. If at each division 
the half to be discarded is decided by lot, say by the toss of a coin, 
the final sample will be truly representative. But it will not be a 
simple random sample unless the final reduced tufts contain only 
one hair each. However, we shall show how to treat such complex 
samples in section lo.ii. 

The second method may often be applied when the population 
to be sampled is large but finite. Suppose we wish to inspect a 
random sample of houses in a large town. The town may be divided 
into wards, and the wards into streets. One ward may be chosen at 
random, a street may be chosen at random from those in the ward, 
and a house from those in a street; this house is a randomly chosen 
individual, and if the whole process is repeated several times, the 
resulting houses are a random sample of the town, provided the 
wards and streets are approximately equal in size. If the wards, the 
streets within each ward, and the houses in each street are all 
systematically numbered, the problem of random selection resolves 
itself into one of choosing a number at random. This process is 
facilitated by Random Sampling Numbers (Tippett, 1927), which 
is a collection of 40,000 numbers from 0 to 9, arranged in random 
order. By combining an appropriate number of digits taken any- 
where from this collection, random numbers of almost any magnitude 
may readily be obtained* 

It should be noted that if arrangements are made to represent 
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each ward equally in the sample or to see that no two houses are 
in the same street, the sample is representative but not random. 
Such may be called a stratified sample, and its random errors are 
dealt with in section 10.13. 

The theory of random sampling is sometimes applied to experi- 
mental errors in physical determinations. It is not unreasonable to 
regard the limited number of measurements that may actually be 
made of a physical quantity as a random sample of an infinite 
population of measurements that could conceivably be made tinder 
the same conditions. The average of this population, however, is 
not necessarily the true value of the quantity; it is affected by errors 
of bias due to the particular conditions of the experiment, to such 
factors as the personal error of the observer and idiosyncrasies of 
the apparatus. Since in most physical experiments errors of bias 
are liable to be of the same order of magnitude as random errors, 
the theory of random sampling is of very limited utility in this field. 

The qualities of randomness and representativeness do not depend 
on the size of the sample; a sample of five, if properly chosen, is 
as random and representative as one of five thousand. In this chapter 
we are limiting, not the general theory, but the application of the 
theory to large samples, i.e. samples of at least a few hundreds, and 
this is done to simplify matters. The more exact theory for small 
samples will be given in Chapter V. 

We shall again make the further assumption that the population 
sampled is infinite in the sense that its composition is not appre- 
ciably altered by the abstraction of the sample. This is satisfied, 
practically, if the individuals are drawn one at a time from even a 
very small bulk provided they are returned to the bulk and well 
mixed with the other individuals between each draw. 

Tests of Significance 

8 . 2 * Many scientific investigations involve the employment of the 
method of framing working hypotheses and testing them experi- 
mentally. As long as the experiments fail to disprove them, so long 
are the hypotheses accepted. This is the general method by which 
many statistical inferences are made. A hypothetical population of 
certain characteristics is postulated, and if the sample is such that 
it could reasonably have come from that population, the hypothesis 
is accepted. Owing to sampling errors, however, there is no sharp 
dividing line between samples that could have come from the 
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hypothetical population and those that could not. It is only possible 
to give a probability that a sample like the one observed could have 
come from the population. If the probability is low, the hypothesis 
is rejected; if it is high, the hypothesis is accepted and the deviation 
between the sample and postulated population is attributed to 
errors of sampling. The sample may be characterised by one or more 
statistical constants, and in the next paragraph the process will be 
illustrated for the mean of samples. 

If there is some ground for assuming*^ the population to be normal 
with a given mean and standard deviation, the distribution of the 
means of samples of any given size is the sampling distribution 
described in section 2.71, and from the probability integral, the 
probability of a sample mean deviating from the population mean 
by more than any value may be deduced. For example, we see from 
Table 2.5 that the probability of a sample mean differing from the 
population mean by plus or minus three or more times the standard 
error is 0*002 70. This is low, and if any actual sample does differ 
in mean from the population value by more than this amount, 
we infer that the deviation may not reasonably be attributed to 
random errors and the hypothesis regarding the population has 
to be abandoned. 

The probability of a random deviation exceeding any given value 
is called the level of significance of the deviation, and is often 
expressed as a percentage. For example, a deviation of d: 3 times 
the standard error is on the 0*27 per cent, level of significance and 
one of I *0 times the standard error is on the 32 per cent, level. 

By way of example we will assume the lengths of 4 000 hairs of 
an Indian cotton given by Koshal and Turner (1930) to be an 
infinite population.! Their mean length is 2 • 33 cm. and the standard 
deviation is 0*480 6 cm. The first thousand hairs were selected by 
a different method from the rest and gave a mean of 2*54 cm. Is 
this deviation compatible with the hypothesis that the i 000 are a 
random sample from the 4 000 and that the difference in means is 
due to random errors, or is the difference large enough to indicate 

^ There is no fundamental distinction between an hypothesis and the 
assumptions (e.g. normality). However, a good test is sensitive to falseness 
in the hypothesis, which is what we want to test, and comparatively 
insensitive to errors in the assumptions, which are subsidiary. 

t This assumption is an approximation, for 4 000 is not infinitely large 
compared with i 000, the size of the sample we are testin<^. 



RANDOM SAMPLING AND STATISTICAL INFERENCE 71 


that the change in technique has had an effect? The standard error 
of the mean is 


0-480 6 
Vi 000 


0-0152, 


and the deviation of 0*21 cm. is over 13 times the standard error. 
Sheppard’s normal probability tables show that this is far beyond 
the 0-000 001 level of significance, and the hypothesis is untenable. 

Any deviation that is large enough and is on a sufficiently low 
probability level to lead to a rejection of the hypothesis regarding 
the population is said to be statistically significant and the mean of 
the sample is said to be significantly different from that of the popu- 
lation. The question arises : at what probability level does a deviation 
become significant? There is no rational probability level at which 
possibility ceases and impossibility begins, but it is conventional 
to regard a probability of 0-05 as the critical level of significance. 
The considerations that govern the choice of this level will be dis- 
cussed in section 3.3, and it is sufficient here to state that this 
convention has been found to give a satisfactory rule of action in 
most circumstances. It will be seen from Table 2.5 that a deviation 
of twice the standard error corresponds roughly to a probability 
level of o * 05, so we have the following working rule : 

A deviation of a sample mean from an assumed population 
value of twice the standard error lies on the 0-05 (or 5 per cent.) 
level of significance, and a deviation greater than this amount is 
statistically significant. 

As a measure of dispersion of the sampling distribution of sample 
means, the quartile deviation is sometimes used instead of the 
standard error, and is called the probable error. The probable error 
is o • 674 49 times the standard error, and three times the probable 
error is roughly equivalent to twice the standard error. The probable 
error has no particular advantages, and as it involves the trouble- 
some factor 0 - 674 49, it is rapidly going out of use. 


Significance of Difference between Two Sample Means 
3 . 21 . The most common situation is one in which the investigator 
has two samples, and wishes to know if their differences are real or 
may be attributed to errors of random sampling. Again we shall 
confine attention to the two means. The appropriate hypothesis is 
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that the two samples are from populations having the same mean, 
and the probability level of the observed difference is calculated 
accordingly. Again the hypothesis is accepted if the level is fairly 
high and there is no statistically significant difference between the 
means; if the level is low (say below 0*05), the hypothesis is rejected 
and the difference is real. To calculate these probabilities it is 
necessary to know the sampling distribution of the difference 
between two means. 

Let the total numbers of individuals in the two samples be 
and Then it is possible to imagine a sampling experiment in 
which a very large number of pairs of samples are taken from the 
respective populations, the number of individuals in the first of each 
pair being and the number in the second being Nq, For each 
sample the mean may be found, and hence for each pair, the 
difference between the two means. There will be as many differ- 
ences as there are pairs of samples, and these may be formed into 
a frequency distribution — ^the sampling distribution of the difference 
between two means. This distribution has been deduced mathe- 
matically and is normal with a mean value of zero, as may be 
expected, and a standard deviation (or standard error) larger than 
the standard error of either sample mean taken separately. If SE^ 
is the standard error of the distribution of one series of means and 
5^2 is that of the other, the standard error of the distribution of 
differences is 

SE,_,= V{SE,Y + {SE,f* . . . (3.1) 

provided the two samples are independent. Further, if o-i and 0*2 
are the standard deviations of the individuals in the two popula- 
tions, the standard error of the difference between the two means is 



Usually, 0*1 and 0*2 do not differ appreciably, and it is reasonable 
to include in the hypothesis the postulate that the two populations 
are the same, so that ai = o-g. 

In these circumstances, if the two single samples are of the same 
size they have the same standard error, and the ‘Twice the standard 
error’’ criterion leads to the working rule that differences greater 

* This follows easily from the equations in section 
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than three times the standard error of a single mean are significant; 
for if SEi = SE2, SE^_^ — 'x/zSE-y and a'v/ 2 is nearly equal to 3. 

In order to compute the standard error, it is necessary to know a, 
the standard deviation of the individuals in the population. Fre- 
quently this is unknown, and as an approximation an estimate i 
obtained from the samples is used instead. This practice limits the 
application of the theory to large samples. Where there are two 
samples and a common standard deviation is assumed, some com- 
bined estimate of 5 should be used since the hypothesis is that the 
samples are from the same population. If and are the two 
sample estimates, a suitable combined estimate is 


s 


-f Wgy| 

N^ + N^ 


• ( 3 - 3 ) 


and substituting this value of s for the values of a in equation (3 .2) 
we have 





■ ( 3 - 4 ) 


Frequently, however, the standard errors of separate means are 
calculated more or less as a routine and it is convenient to use 
these directly in equation (3.2) leading to the expression 



This latter course is not consistent with the hypothesis that the 
samples are from the same population, but when the samples are 
large the two courses usually lead to practically the same results.*®^ 
The second is used in the following example. 

The British Association Report for 1883 gives on p, 256 distri- 
butions of the heights of men bom in England and Scotland ; as an 
example we will test the significance of the difference between the 
two means. The necessary data are in Table 3.1. 

The difference in height is i • 108 i inches, and its standard error 
is 

iV 0*032 38^ + o*o68 68^ = ± 0*075 95 


* The second course is consistent with the original hypothesis that the 
samples are from two populations with the same mean and different standard 
deviations. This is a legitimate assumption, but is not usually as acceptable 
on other grounds as the one given above. 
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i-io8 I is 14-6 times 0*075 9, and we thus conclude that Scotsmen 
are really taller than Englishmen. 

General Discussion 

3 . 3 . We shall first discuss the choice of the level of statistical 
significance. If we work consistently according to the rule that the 
hypothesis is acceptable provided the observed difference or devia- 
tion is below a given level and is rejected if the deviation is above, 
one of four possible situations may arise^ : 

We may be in error because we : 

(1) Reject a true hypothesis, or 

(2) Accept a false one ; 


TABLE 3.1 


Country 

Number in 
Sample 

Mean 

Height 

(inches) 

Standard 

Deviation 

(inches) 

Standard Error of Mean 

England 

6 194 

67-4375 

2*548 

3-548 -r V6 194 
= ± 0-032 38 

Scotland 

I 304 

68-545 6 

2 * 480 

1 

2 • 480 4 - V^i 304 
== ± o*o68 68 


or we may be correct because we : 

(3) Accept a true hypothesis, or 

(4) Reject a false one. 

In accepting a hypothesis according to the rule, we only do so 
tentatively, since no hypothesis is, or ever can be^ finally proved. 

In the long run of statistical experience, the ratio of wrong 
inferences under (i).to total inferences under (i) and (3) when the 
hypothesis is a true one can be made as low as we please by making 
the level of significance highf ; indeed, this ratio is the probability 
level, and the rule already proposed leads to a true hypothesis being 
rejected once in every twenty experiments for which a true hypo- 

^ This analysis was given by Neyman and Pearson (1928), and some 
writers have accepted the classification to the extent of referring to 
situations (i) and (2) as ^‘errors of the first and second kinds.’’ 

t A low probability corresponds to a high level of significance. 
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thesis is made. By adopting a level of o-oi, this kind of error is 
made only once in every hundred experiments. Unfortunately, by 
using a higher level of significance, we are more likely to accept 
the hypothesis and hence are more likely to be wrong under situa- 
tion (2). Thus the choice of the level for a criterion of acceptance or 
rejection is a compromise between the risks of the two types of error. 

Unfortunately, it is not possible to state the proportion of 
erroneous inferences under (2) to the total experiments in which a 
false hypothesis has been made. It will depend to a considerable 
extent on how near the hypothesis is to the truth as well as on the 
stringency of the test. 

The choice of the critical level of significance will depend to some 
extent on circumstances, but it is governed largely by the following 
consideration. We usually have a large body of knowledge and a 
scientific tradition to guide us, so that there is a fairly strong pre- 
disposition towards accepting any hypothesis we make. One of the 
strongest traditions favours simple hypotheses involving few con- 
stants in preference to complex ones involving many constants. For 
example, in investigating a possible difference in height between 
Englishmen and Scotsmen as in paragraph 3.21, it was in accordance 
with scientific practice to prefer the hypothesis of a single mean 
height to one involving two heights, i.e. to assume no difference 
until the contrary was proved. It is because of this, and because the 
effect of an error under (2) is less serious than one under (i) that 
the chosen criterion of significance usually has a probability level 
as low as 0-05, and in instances where there are exceptionally strong 
a priori reasons for preferring the hypothesis or an error under (i) 
may* have important consequences, the appropriate probability level 
is O’ 01 or even lower. On the other hand, most hypotheses tend to 
be of a negative character in that they are in accordance with existing 
knowledge, and by choosing too low a probability corresponding to 
a level of significance that is too high, the proportion of mistaken 
inferences of the second kind may be too great, and advance of 
knowledge may be unjustifiably impeded. Too much scepticism may 
be obstructive. 

3 . 31 * It can be seen that one means of reducing the effect of 
erroneous inferences of the second kind without affecting those of 
the first is by reducing the standard error of the statistical constant 
under test. For a given level of significance, a smaller standard error 
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leads to a correspondingly smaller deviation lying just on the chosen 
level of significance, and so to a smaller deviation wrongfully dis- 
counted as not significant. The standard error may be reduced by 
increasing the size of the sample, and we shall see in section 3.7 
that from this point of view some statistical constants are preferable 
to others that measure equivalent properties of the frequency 
distribution. 


The statistical significance gives no information as to the magnitude 
or practical importance of any difference; that can only be judged 
by one with technical knowledge of the subject to which these 
methods are applied. A very large sample may make very small and 
unimportant differences overwhelmingly significant, while if the 
sample is small, large and important differences may be obscured by 
random errors. The verdict ‘‘not significant,’’ therefore, is more like 
the “not proven” of Scots law than “not guilty.” If an approximate 
value of the standard deviations of the populations is available (say 
from preliminary samples), it is always possible to estimate how 
large samples are necessary to show up a given difference ; N should 
be amply big enough to make twice the standard error of the difference 
less than the specified one. 

If a sample mean is significantly different from a hypothetical 
population value or the difference between two sample means is 
significant, statistical theory can shed no light on the cause of the 
deviation. It may be either that the bulk sampled is really different 
from the hypothetical population or that the sample is not truly 
representative and random; either that there is a real difference 
between the two populations sampled or that there is some 
difference in sampling technique. 

It may be noted that there is nothing in these tests of significance 
that provides a criterion for choosing between hypotheses that may 
be compatible with the data. Such a choice must be made on non- 
statistical grounds. 

3 . 32 . So far we have taken as the 0*05 level of significance a posi- 
tive or negative deviation for which the two “tails” of the probability 
curve total 0*05, i.e. each “tail” separately is 0*025. This is because 
we usually have no a priori reason for deciding that the deviation 
or difference should have one sign or another. In the example of 
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section 3.2 treating of the lengths of cotton hairs, the mean of the 
sample of i 000 happened to be greater than the assumed popula- 
tion ; we should have tested the deviation in the same way had the 
mean been less. In the example of section 3.21 the Scotsmen were 
apparently taller, on the average, than the Englishmen; we should 
have tested the difference had the Scotsmen been shorter. In cir- 
cumstances like these, the probabilities must be added for both 
positive and negative deviations or differences, so as to make the 
calculation correspond to practice and to give correctly for the long- 
run of experience the ratio of mistaken to total inferences under 
situations (i) and (3) above. For differences, the argument may be 
put in another way which may be more helpful. Our hypothesis is 
that the true value is zero, and the question we ask is, ‘‘What is the 
probability that random sampling will give a difference greater than 
that observed?’’ Now we do not know the sign, and it is purely an 
accident whether in comparing two means, and x^ (say), we take 
— x^ or {x^ — x^^ so we may always make the difference positive 
(= + dy say). Then in considering the sampling curve we must also 
do the same thing and only use the positive half of it, in which 
differences can only range from zero to plus infinity. Thus the 
probability that of all the positive differences that could be obtained 
from random sampling, one would be greater than + di& the ratio 
of the area beyond the ordinate 2.1 d to the total area of the halved 
normal curve. In Fig. 3, OE is twice the standard error, and the 
area imder the curve to the right of EF is 5 per cent, of the half of 
the curve to the right of the origin, O, or 2*5 per cent, of the full 
curve. Thus, there may be a distinction between the 5 per cent. 
value or point at which an ordinate cuts off a tail of 5 per cent, of 
the total area, and the 5 per cent, level of significance. 

Sometimes, it is reasonable to regard a deviation corresponding 
to a single “tail” of 0*05 as lying on the 0*05 level of significance. 
For example, a strong believer in Scottish superiority might argue 
that Scotsmen cannot possibly be shorter than Englishmen, that 
they must either be equally tall or taller. He would only subject to 
statistical test a difference that showed Scotsmen as taller, and any 
difference of the opposite sign he would dismiss without test as 
being due to random errors. It is consistent with such an attitude 
to regard a difference of i • 65 times the standard error, corresponding 
to a single “tail” of 0*05 as lying on the 5 per cent, level of signifi- 
cance. It is only legitimate to do this, however, if the grounds for 
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deciding the sign of the difference or deviation (if real) are a priori 
and independent of the result given by the sample. 

Groups of Samples 

3,33. The probabilities deduced in accordance with the foregoing 
theory are only for single pairs of samples taken at random, and when 
there are several samples, the problem of significance is more com- 
plicated. If we had a hundred differences whose true value was zero, 
we should expect four or five of them to be greater than twice the 
standard error, but if we applied the simple theory by ‘Tule-of- 
thumb,’’ we should erroneously report them as real. Similar con- 

TABLE 3.2 

^ Deviation ^ _ _ 

Largest = of n Lying on 0-05 Level of Significance 

Standard Error 


n 

Deviation 

Standard Error 

I 

2 *0 

2 

2-2 

4 

2*5 

6 

2*6 

10 

2*8 


siderations must be applied to groups of less than one hundred tests 
and we will work out the significance of the biggest of n deviations. 

Suppose there are n independent differences between pairs of 
means; e.g. the aim may be to see if several treatments have an 
effect on some quantity and one of each pair of samples may be an 
untreated control (a separate one for each pair) and the other be 
given one of the treatments. We wish to test if the biggest ratio 
difference! standard error of difference is significant. Let the proba- 
bility that errors of random sampling would give a single difference 
equal to or greater than d (say) be P (as found in the ordinary way 
from probability tables), and let P„ be the probability that the 
biggest of n would equal or exceed d. Then (i — Pf) is the proba- 
bility that n differences would be less than d, and (i — P) that one 
random difference would be less ; from Rule II (section 2 . 1 1) we have 

(i^P,) = (i-~P)^ 
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whence = i — (i — Pf, 

Table 3.2 has been worked out to show what value the largest of 
n deviations (in terms of the standard error) must reach to lie on 
the 0*05 level of significance for the normal distribution. 

Sometimes we want to know if the difference between the largest 
and smallest in a group of sample means is significant; such a 


Range 


Standard Error of a Difference 

Significance 


TABLE 3.3 

on 0-05 AND 0*01 Levels of 


Number of Samples 

p — 0 05 

P = 0*01 

2 

2*0 i 

2*6 

4 

2*6 

3*1 

6 

2*9 

3*4 

10 

3*2 

3*6 


difference is a range. E. S. Pearson (1932) gives some values of 
the range at various levels of significance and Table 3.3 has been 
calculated from them. When there are four samples, a difference of 
2* 5 times is equivalent to one of twice the standard error for a pair. 

We shall illustrate this by the data of Table 3.4, which have been 
obtained from Table 10.6 showing the mean corn yield per plot 
for a number of agricultural plots subjected to five treatments. 


TABLE 34 


Treatment 

Mean Yield per Plot, Grammes 

A 

295*2 

B 

297*5 

C 

276*3 

D 

272*2 

E 

271*8 


We are given that the standard error of any one mean is 8*712, 
so that the standard error of any random difference between two 
means is 12*32 grammes.* The largest difference is between treat- 

^ The treatments are ‘‘dummies.” The standard error is deduced from 
Table 10.71. 
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ments B and E, and is 25-7 or 2- 1 times the standard error. For a 
single randomly chosen pair of means, this difference would be 
significant, but here it is the largest difference in a set of five means, 
and from Table 3.3 we deduce that it is well below the 5 per cent, 
level of significance. 

If it is desired to compare the means of several samples as a whole, 
there are other and more suitable methods that will be described 
in Chapter VI; but if the wish is to compare selected pairs, the 
above considerations show at least that more stringent tests of 
significance should be applied than when a single pair is taken at 
random. 

Applicability of Normal Sampling Theory 
3 . 4 . The correspondence of a deviation of twice the standard error 
to the 0*05 level of significance depends on the sampling distribu- 
tion of the mean being normal. Strictly, this only happens when 
the individuals in the sampled population are normal, but for large 
samples taken from most non-normal populations that are likely to 
be met with in practice, the distribution of the mean is nearly 
normal, and the normal sampling theory is sufficiently close an 
approximation for practical purposes. This frequently is a second 
reason for applying the methods of this chapter to large samples only. 

The population and a sample may be characterised by constants 
other than the mean, and in fact any of the constants mentioned 
in Chapter I may be used. These all have their sampling distribu- 
tions and, if they are known, deviations corresponding to various 
levels of significance may be calculated. For large samples, however, 
most of these distributions tend to be nearly normal, and as an 
approximation it is usual to apply the normal theory by calculating 
the standard error of the constant, and regarding deviations of twice 
this standard error as lying on the 5 per cent, level of significance. 
In this chapter, further applications of sampling theory will usually 
follow this procedure, but it is desirable for the sake of subsequent 
applications to discuss the use of skew sampling distributions. This 
discussion follows in the next section. 

Tests of Significance Based on Skew Sampling Distributions 
3 . 41 . For a symmetrical distribution we decided in section 3.32 
to regard the 2-5 per cent, points (on the positive and negative 
sides of the population value) as lying on the 5 per cent, level of 
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significance, and since the two deviations are equal, only one need 
be specified. When the distribution is not symmetrical, some modi- 
fication is obviously necessary, and there are three possibilities. 

(i) We may retain the equality of the positive and negative devia- 
tions, and regard as lying on the 5 per cent, level of significance a 
value such that the areas of the two tails beyond ordinates at these 
points together add up to 5 per cent, of the total area. In Fig. 6 
(which is not drawn to any scale, and is only diagrammatic), O is 
the population value, and OA = OA' is the significant deviation, 



Fig. 6. 

the areas beyond the ordinates at A and A' being 5 per cent, of the 
total. 

Such a criterion is to be condemned, for it gives positive and 
negative deviations equal importance. 

(2) Extending the argument already applied to differencesr in 
section 3.32, we may use separate tests for positive and negative 
deviations from the population value. If the deviation under test is 
positive, we use only that part of the sampling distribution on the 
positive side of the ordinate drawn at the population value, and regard 
as lying on the 5 per cent, level the deviation at which an ordinate 
cuts off a tail of 5 per cent, of the positive part of the curve. Similarly, 
if the deviation is negative we use only the negative part of the 
sampling distribution. In Fig. 6, OB and OB^ are the two deviations, 
and the area to the left of the ordinate at is 0*05 of that to the 

F 



Sz 


METHODS OF STATISTICS 


left of OM, while the area to the right of the ordinate at S' is 0*05 
of that to the right of OM. Because OM does not divide the total 
area into equal parts, the two tails are not equal and therefore are 
not 0*025 

(3) We may regard as lying on the 5 per cent, level of significance 
the deviations beyond which the two tails are each equal to 2*5 per 
cent, of the total area under the curve. In Fig. 6, C and C' are about 
on these points. The shape of any frequency curve may be altered 
by plotting the abscissas in terms of some function of x (e.g. putting 
x' = log x)y and it is theoretically possible to choose the function 
so that the curve reduces to the normal form. Such a procedure 
alters the relative positions of the mean and mode, but not of devia- 
tions defined by the ratios in which they divide up the area. Con- 
sequently 2*5 per cent, values of the transformed variate x' still 
remain 2*5 per cent, values of the untransformed variate x in the 
skew distribution. If the normal curve is regarded as the fundamental 
curve of errors, the deviations lying on the 5 per cent, level of 
significance according to this criterion would satisfy the normal 
criterion if the curve were transformed. 

At the moment, both the second and third uses of skew curves 
for testing the significance of differences appear to be consistent 
extensions of the criteria developed for symmetrical curves, and it 
is difficult to decide between them. Fortunately, the difference in 
results obtained by the two methods is of no practical importance, 
and the last one will probably be most convenient in practice. 

Sampling Errors of Various Constants 
Means of Binomial and Poisson Distributions 

3 . 51 . When developing the binomial distribution in Chapter II 
we regarded the sets of trials as individuals and the number of 
occurrences per set as the variate. If, however, we regard the trials 
as individuals for which the variate can only take one of two charac- 
teristics, viz. success or failure, the set of n trials becomes a sample 
of n and the binomial distribution becomes a sampling distribution 
of the number of sucesses per sample. From this point of view, the 
small n of the binomial is equivalent to the large N of this chapter. 
As stated in section 2.5, when the number in the sample is large, 
the binomial distribution approaches the normal and the normal 
sampling theory may be applied. ^ 
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For the binomial the standard error of the number of successes 
(Z = np) is the standard deviation given in equation (2.2), p. 48, 
and is 

on dividing thus by n we obtain, 

Standard Error of p — ^ * 

Darbishire (1904), on crossing waltzing with normal mice, found 
in the Fg generation 458 normals and 97 waltzers (total 555). The 


TABLE 3.5 


Diet 

Males 

Females 

Total Young 

Percentage Males 

Vitamin B Deficient 

123 . 

153 

276 

44-57 

Vitamin B Sufficient 

145 

ISO 

295 

49*15 

Totals 

z68 

303 

571 

— 


Mendelian expectation for I is 416 normals with a standard error of 


The diflterence between the actual and expected number of normal 
mice is 42, and being 4-1 times the standard error is significant; it 
could only arise from random sampling four times in 100 000 trials. 

Parkes and Drummond (1925) give the data of Table 3.5, showing 
the effect of vitamin B deficiency on the sex-ratio of the offspring 
of rats. 

In comparing the sex-ratios, we are not comparing an experimental 
ratio with a theoretical one, but two esperimental ones. Hence, we 
do not know the values of^ in the infinite population required for the 
calculation of the standard errors ; we will use the sample values as 
an approximation. The standard errors of p are 


i 


0-445 7 


■198 6 


276 


d; 0-029 9 
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491 5 — 0*241 6 


295 


± 0*029 


and that of the difference is d= 0*041 7, or ± 4‘i7 per cent. The 
difference in sex-ratio is 4*58, and being only i * 10 times its standard 
error is insignificant; it would arise from random errors nearly 
27 times in 100 trials. 


3 . 52 . When the mean number of failures {m) per set is large, 
the Poisson distribution also approaches the normal form. Then 
the standard error per set is V m, the standard error of the mean 
of N sets is V mlN, and that of the total number of failures in 
N sets, mN is 

Thus for the distribution of yeast cells of Table 24, the total number 
of cells counted is i 872, with a standard error 

± V I 872 = ±43'3; 

the mean number of cells per square is 4- 68, and its standard error is 

/ 4 ^ 

- — = ± 0‘io8. 

\ 400 


Measures of Dispersion 

3 , 5 S. The standard deviation estimated from the variance has a 
standard error of 


Viiv 

where a is the standard deviation of the population and N is the 
size of the sample. Its distribution is not normal, but for large samples 
it approaches normality; e.g. at N=xoo, jSi = 0*0051 and jSg 
= 3*0000. Let us see if the Scotsmen of Table 3.1 are really 
more regular in height, as well as being taller on the average, than 
the Englishmen. The standard error of a standard deviation is 
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ij's/ z times that of a mean, so for the difference of standard devia- 
tions of the table it is 


-7= X 0-0759 = ±0-053 7- 
V 2 

The difference (2*548 — 2-480 = o -068) being only 1*27 times 
its standard error is not significant; in fact, random sampling would 
give as big a difference about one trial in five (P = 0*2). 

Only when found from the second moment has the standard 
deviation the above standard error. We shall now deal with its 
standard error when estimated from the range. E. S. Pearson (1926 
and 1932) has worked out the sampling distribution of the range 
fairly fully, and although for no size of sub-sample is it normal, it 
is only moderately skew for those of about 10; for sub-samples 
of 10, jSi = ±0-156 and — In such circumstances, we 
may use the standard error as an approximate description of the 
sampling errors. Pearson gives the standard error or deviation of 
the range for several sub-samples, and for those of 10 it is 0-797 (j, 
where a is the true standard deviation. Suppose the sample of N 
contains m sub-samples of 10, so that N = 10 m. Then from sec- 
tion 2*71, 


Standard Error of Mean Range = 


0 - 797(7 


Vi 


m 


We also have from Table 1.2, 


Estimated Standard Deviation = 


Mean Range 
3-078, 


whence 

Standard Error of Estimated Standard Deviation 

Standard Error of Mean Range _ o • 797 a 1-1580 
~ 3-078 3 '° 7 ^Vm V^ 


This may be compared with the standard error of the estimate 
obtained from the variance. 

In large samples, the standard error of the mean deviation is 


8 

v^ 


0-861 


a 

V^’ 


1-068 
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where § is the population value of the mean deviation. It follows 
that the standard error of the standard deviation estimated from the 
mean deviation is 


1-253 X o-86i 


a 


I *068 


v^' 


Constants of Shape 

3 . 54 . The ^ ratios are useful for testing the departure of any data 
from normality, but their distributions in samples drawn from an 
infinite p6pulation of that form have not yet been worked out. The 
first four moments of the sampling distributions have been derived 
by Fisher (i93o<2), and from approximate values E. S. Pearson (1930J 
has determined appropriate empirical frequency curves of K. Pear- 
son’s system. Curves found in such a way usually give good approxi- 
mations to the actual distributions. 

To test asymmetry, y^ — (with the same sign as is used. 
For samples of N from the normal population, this has a standard 
error of V 6 jN (to a first approximation), and the distribution itself 
is so nearly normal that a deviation of twice the standard error lies 
practically on the 0*05 level of significance. 

The standard error of V 24//^ (to a first approximation), 

but the distribution is so skew, and this approximation is so poor, 
that it does not give a reliable test. Pearson’s tables Qoc, cit.) give 
for samples from the normal population, values of ^2 on each 
side of the population value (3*0) which cut off tails of 5 and i per 
cent, of the whole curve; these tables should be used. From the 
discussion in section 3.41, it is suggested that the 5 per cent, level 
of significance should be taken as lying between the 5 and i per 
cent, values. 

For the distribution of heights of Table 1.5, we have 

— 0‘ii6 ± 0*074 6, — and ^=1078. 

Yi is less than twice its standard error, and so is insignificant, while 
from Pearson’s table, when 1 000, the lower value of jSg with a 
“tail” of 0*05 is 2*76, and since the value above is nearer 3 than that, 
we may conclude that the d^ta are, as far as we can tell, from a normal 
nonn1«>tinn. 
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Standard Errors of Functions of Statistical Constants 
3.55. Sometimes it is desired to calculate the standard errors of 
functions of one or more statistical constants, the errors of the 
individual constants being known. We have already had an example 
in the difference between two means *, the ratio and product of two 
means, and the square of the standard deviation are other examples. 
We shall describe here an approximate method that is applicable 
to large samples. Where there are more than one constant we shall 
assume they are obtained from independent samples except in one 
instance. 

Let the population values of the constants be a and jS, let the 
function be 

A-/(a,^) 

and let the corresponding sample values for any one pair of samples 
be /, a and b, where 

/ =: A + SA, = a + Sa and 6 = ^ + S^S, 

S is used as a sign meaning a deviation. Then for the pair of samples 
/ = y‘(a + Sa, ^ + S^). 

It can be shown that, for the purposes of deducing standard errors 
and variances, the relations between SA, Sa and S^ may be approxi- 
mately derived by regarding them as mathematical differentials, so 
that 

" SA = |4a + ^§i8 

Sa dp 

The degree of approximation involves neglecting quantities of the 
order i/N compared with unity, where N is the number in the 
sample. 

The mean values of the squares of SA, Sa and SjS for all possible 
pairs of samples from the population are the squares of the corre- 
sponding standard errors and may be written (SE^Y 

(SEfj)^. Hence, by squaring the terms of the above equation and 
finding the means for all pairs of samples, 

where [Sa 8^] is the mean value of the product of the deviations 
Sa and Sj8 for all pairs of samples. If the two samples in the pair 
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are independent, this mean product is zero, as will be shown in 
Chapter VII.* Hence, for independent pairs of samples, 

■ . ( 3 - 6 ) 

This equation may easily be extended to three or more statistical 
constants. 

When the function is the diiference between two means, equa- 
tion (3.1), section 3.21, follows from (3.6) directly. 

By way of example, let us deduce the standard error of the 
variance ® of a sample in terms of that of the standard deviation s, 
so that l—v,a = s and there is no b. Let the population value of 
the standard deviation be a ; then 

© = s® 

As another example, let I be the ratio between two means x and 
with corresponding population values of A, | and 17. Then 

y 

{SE,f \ 

i 

The standard errors of x and j; have already been given in section z.ji . 

It has been shown in section 2.7s that the mean and standard 
deviation estimated from the same sample are independent, and it 
follows from this that equation (3.6) may be used to determine the 
standard error of the coefficient of variation. If 100 k, | and cr are 
respectively the population values of the coefficient of variation, 
mean and standard deviation, corresponding to A, a and j8, and 
k is the sample estimate of k, it is easy to see from equation (3.6) 
and the standard errors given in sections 2.71 and 3.53 that the 
standard error of the coefficient of variation is 

2/c^ -f* I 
2iV * 

* After reading Chapter VII, readers will be able to evaluate the product 
term and so deal with constants ffiat am -not indenendent. 
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Determination of Population Value from Sample 
3 . 6 - We have seen how a statistical constant detefmined from a 
sample may be used to test some hypothesis concerning the popu- 
lation value. Sometimes, however, we are unable to postulate the 
population value, of which we look to the sample to teach us some- 
thing. This process of estimating the population value from the 
sample or of arguing from the particular to the general is the 



Fig. 7. 

inductive method, and the question arises as to the accuracy of the 
estimate given by the sample. This may be answered if the sampling 
distribution of the constant is known, and we shall now illustrate 
the kind of answer provided by referring .to samples of 100 indi- 
viduals taken to measure the proportion having a given character, 
i.e. the proportion of successes. 

The population value of this proportion may be denoted by tt* 
and any sample value hyp- Then a diagram like that of Fig. 7 may 
be drawn (this is not drawn to scale), where for any population 

* Here w is not 3 '14159 . . . 
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value 'jTi the point B for which p == represents the mean of all 
the sample values, and the points A and C represent values of p 
lying on the 5 per cent, level of significance. These points may be 
found from the sampling distribution of p, which is given by the 
terms of the binomial (tti + i — and if we may assume this 

to be approximately normal, 

AB = BC == Twice the Standard Error = 2^/ 7Ti{i — -f* 100. 

For example, if = o-8, B is at p = o*8, and the standard error 
is a/o^S X 0*2 100 = 0*04; hence A is at == o-8 + o-o8 
= 0*88 and C is at = o*8 — o-o8 = 0-72. In this way, points 
corresponding to A and C may be found for all possible values of tt, 
and these lie on the curved lines shown diagrammatically in Fig. 7, 
The points corresponding to B lie on the line p ^ it. Then we 
know that for any one value of tt, 95 per cent, of the sample values/ 
lie within the limits represented by the points on the curved lines 
at which an ordinate drawn at tt cuts them, and so by adding the 
experience for all populations, we know that 95 per cent, of the 
sample values lie within the limits of the curved lines, whatever may 
be the relative frequencies with which the different values of tt occur. 

If in making inferences from samples it is assumed that all 
sample-population points lie within the limits, i.e. that for every 
sample value p^ (say) the population value lies between 773 and TTg, 
we shall be right in 95 per cent, of the inferences. In this way, it 
is possible to make an estimate from a sample of the range within 
which the population value lies, with a given long-run risk of being 
wrong. 

Fisher (1930&) has termed the limits represented by the curved 
lines the fiducial limits and the corresponding proportion of correct 
inferences (95 per cent, in Fig. 7) Xht fiducial probability . Neyman 
(1934) uses the terms confidence limits and confidence coefficient. It is 
possible, of course, to determine limits corresponding to other 
fiducial probabilities, e.g. for the 99 per cent, level, and for any 
statistical constant that has a known continuous sampling distribu- 
tion. In order to determine the sampling distribution of the 
constant, it is usually necessary to assume the form of the popula- 
tion, e.g. normal, and it may be necessary to assume the standard 
deviation. When the distribution is skew, difficulties of the kind 
mentioned in section 3.41 have to be faced. 

T« FiV *7 bv the methods outlined, the 
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result would be limits of p for known values of tt; for practical 
purposes it is more convenient to have limits of tt for known equi- 
distant values of p. It is not proposed here to go into the method 
of arriving at this desired form of expression (simple inverse inter- 
polation is effective), but it is pointed out that the limits of tt 
corresponding to a sample value p are not p d: p{i — p) 100. 

For example, when p^ — o-z, 712 = 0*291 and ^3 = 0* 132. It will 
be noted that the limits of tt are not even equidistant from^. These 
conditions arise when the standard error of a statistical constant 
depends on the population value of the constant as for the mean 
of a binomial or Poisson series. When this is not so, and the standard 
error is independent of the constant, the 95 per cent, fiducial limits 
of the population value are two parallel straight lines, and for any 
one sample value are at twice the standard error above and below 
that value, assuming a normal sampling distribution. Thus the 
95 per cent, limits of the mean height of Englishmen in Table 3.1 
are 67*437 5 d= ^ X 0*032 38, i.e. 67*502 3 and 67*37Z 7 inches. 

Fiducial probability is not the same as inverse probability ^ although 
is appears superficially to be so. An inverse probability statement 
would be applied to any particular instance and would be of the 
form: “given a sample value pQ, the probability of the population 
value lying between 1x2 and is 0*95’’ (see Fig. 7). This involves 
the conception of a super-population of population values corre- 
sponding to the one sample value p^y 95 per cent, of which are 
between Trg and ttq. Such a conception lies behind any statistical 
interpretation of inverse probability and presents many difficulties^ 
one of the chief of which is that there is no basis for making any 
assumption regarding the distribution in the super-population. The 
fiducial probability statement says that for all possible sample values 
taken from any populations, 95 per cent, of the population values 
of TT lie between the limits shown in Fig. 7; no assumption is made 
as to the distribution of the population values. 

Fiducial limits can only be calculated in the way shown in so far 
as the sampling distribution is continuous or may be approximately 
represented by a continuous distribution. 


Choice of Statistical Constants 

3.7. We have seen in Chapter I that there may be several statistical 
constants for measuring one characteristic of a sample or popula- 
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tion; for example, the standard deviation, mean deviation, quartile 
deviation and mean range are all measures of dispersion. The 
question arises: Are there any grounds on which one constant is 
to be preferred to another? Fisher’s theory of estimation (1923, 
i()z$a and 1936) provides an answer, and we shall give as much of 
it as seems necessary to show the kind of answer provided. 

So far, we have regarded statistical constants as convenient 
measures of various characteristics of samples and populations, and 
have assumed some mathematical form of distribution for the popu- 
lation more or less incidentally. In the theory of estimation it is 
necessary to assume or specify some mathematical form of distribu- 
tion in the population as a starting-point. The equation of this 
form has one or more constants or parameters that may vary from 
one population of that form to another, and the particular values 
of which define any given population. For example, if the assumed 
form of the population is the normal curve described by equa- 
tion (2.4), the two parameters are m and a, and given values of 
these define a particular normal population. On this view, m and a 
are not thought of as the mean and standard deviation, i.e. as 
measures of position and dispersion ; they are thought of as mathe- 
matical parameters. Similarly, statistical constants (Fisher calls them 
statistics) calculated from samples are not descriptive averages ; they 
are estimates of the parameters in the equation, and it is on this 
basis that the relative values of equivalent statistics are judged. 
Here, we need concern ourselves with two only of the criteria given 
by Fisher; they are consistency and efficiency. 

Consistency , — ^A consistent estimate of a parameter must equal the 
parameter when it is derived from the whole distribution in the 
population. In a normal population, all the above measures are 
consistent estimates of cr. This criterion is an obvious one and is 
satisfied by all statistical estimates in common use. 

Efficiency . — This criterion applies to those statistical estimates 
from large samples for which the sampling distribution is known 
and is normal, so that it is completely described by the mean and 
standard error.* The mean of the estimate in the sampling distri- 
bution may differ from the parameter, but this difference may be 
calculated and allowed for as bias. The standard error defines the 

* As an approximation, the ideas and criteria may be applied where the 
sampling distribution is nearly normal, i.e. to all constants the standard 
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inaccuracy of the estimate ; here it will be more convenient to deal 
with the square of the standard error, viz. with the variance of the 
estimate. 

For statistics of the class with which we are now dealing the 
variance in samples of N may be expressed in the form 

I 

IN 

where I depends on the statistic and is called its intrinsic accuracy. 
For example, the intrinsic accuracy of the mean as an estimate of m 
in the normal population is i/a^ and that of the square root of the 
second moment as an estimate of a is a/o-^. 

We have seen in section 3.53 that the various estimates of the 
Standard deviation have differing standard errors, and hence intrinsic 
accuracies. Moreover, an increase in I has the same influence on 
the sampling variance as a proportionate increase in iV, so that 
differences between intrinsic accuracies may be expressed in terms 
of N. Thus, for a estimated from the second moment, I == zjd^, 
and for the estimate obtained from the mean range in sub-samples 
of 10, / = 2/1 '34 The variance of the estimate from the range 
in a sample of 100 is equal to that of the estimate from the second 
moment in a sample of 100/1*34 = 75> 

2 X 75 2 X 100 

The loss in accuracy due to using the range instead of the second 
moment is equivalent to a 25 per cent, reduction in the size of the 
sample. We may say that the estimate from the range has an efSciency 
of 75 per cent, of the estimate from the second moment. 

For each parameter, there is a class of statistics that have the 
same intrinsic accuracy, that is greater than the intrinsic accuracy 
of all other estimates, and these are called ^cient statistics. The 
ratio of the intrinsic accuracy of any given statistic to that of an 
efficient one is called the efficiency of the given statistic. The mean 
and standard deviation as defined in sections 1.22 and 1.23 are 
efficient estimates of m and a in the normal population. 

When using one estimate of a parameter, we may regard the 
sample as giving a certain quantity of information regarding the 
parameter, that increases in proportion to the size of the sample. 
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Each individual may be regarded as contributing a unit amount of 
information, and the connection between the intrinsic accuracy and 
size of sample suggests that the intrinsic accuracy should be the 
unit. Fisher in his later writings (1935) has referred to intrinsic 
accuracy as the quantity of information given by a single observation. 
In these terms, the mean range in sub-samples of 10 may be said 
to give three-quarters of the amount of information given by the 
second moment regarding the parameter o*. Just as for a given 
estimate the number of individuals is additive in the sense that two 
independent samples of iVi and iVg give the same amount of infor- 
mation as one sample of + iVg, so for different estimates ^nd 
samples, independent quantities of information are additive. 

When combining estimates of a parameter that have normal 
sampling distributions, the estimates being from different samples, 
we may take a weighted mean, using the quantities o:^ information 
as weights. This is analogous to combining estimates of the mean, 
say, by using the numbers in the samples as weights. The variance 
of the combined estimate is the inverse of the quantity of informa- 
tion. This result is used in sections 8.22 and 11.73. 

There are several reasons why efficient statistics should be used 
except in special circumstances. The use of an inefficient statistic 
is equivalent to throwing away part of the data, and such a course 
is usually only justifiable if ^there is a proportionate saving in labour, 
say in computation. When efficient statistics are used, mistaken 
inferences of the second kind mentioned in section 3.3 are reduced 
to a minimum. Further, populations are estimated with maximum 
precision by the use of efficient statistics, for these have fiducial 
limits corresponding to a given fiducial probability, that are closer 
than for any other estimates of the same parameter. The methods 
Of the next chapter apply only when efficient estimates are used. 

A" distinction must here be made between different statistical 
constants that are merely mathematical transformations of one 
constant, and constants that are essentially different estimates and 
are only related statistically. For example, the variance and standard 
deviation are mathematically equivalent; samples that have the 
same variance must necessarily have the same standard deviation 
and the two constants are equally efficient. On the other hand, the 
standard deviation, mean deviation, and mean range in sets of 10 
are only statistically related in that the relations given in section i .23 
are true only on the average and do not necessarily apply to any 
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particular sample; different samples having the same standard 
deviation may have different mean deviations and mean ranges. 
These estimates do not have the same efficiency. 


Methob of Maximum Likelihood 

3.8. This is a general method for arriving at estimates of parameters 
of distributions, and Fisher has shown that estimates given by this 
method are efficient, provided efficient statistics of the parameter 
exist. 

We have already seen in section 2.72 how the probability of a 
given sample may be deduced when the population and its para- 
meters are known. When the form of population is known the 
expression for that probability (without the differential terms) for 
any assumed values of the parameters is termed the likelihood of 
the assumed values. The particular values that make this likelihood 
a maximum are the maximum likelihood estimates. They are obtained 
by differentiating the logarithm of the likelihood with respect to 
the parameters, equating to zero and solving the equations. 

For the normal distribution with assumed parameters ^ x and 
o* = 5* and a sample of N, the logarithm of the likelihood (to the 
base is : 


log 277 — • N log s 


I (jgg — xf (x^ — 

2 \ ‘ ‘ ’ p i 


If this is diflFerentiated with respect to x and the differential equated 
to zero, it is easily deduced that 


x^ + Xz + ... Xj^ 

N 


Thus, the mean of the sample, as defined in section 1.22, is the 
maximum likelihood estimate of Similarly, by differentiating 
L in (3.7) with respect to s and equating to zero, the maximum 
likelihood estimate of a is found to be the standard deviation as 
defined in section 1,23. 

* The Latin letters denote estimates from the sample and the sign a 
denotes the particular kind of estimate dealt with in this section. 
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3.81. When the data are in the form of a frequency distribution 
with frequencies oi ... tis in the groups, the loga- 
rithm of the likelihood is 

L = Sn,\ogn, ( 3 . 8 ) 

where S is the summation over all groups and 4 is the estimated 

frequency in the 5th group, determined from the equation for the 
assumed population and expressed in terms of the unknown para- 
meters. 

For example, for the Poisson distribution the proportion of fre- 
quency with 5 successes is 


If m is the maximum likelihood estimate of fi, and the total number 
in the sample is iV, , 



By equating to zero the first differential of this with respect to m 
we have 


SsTis SsUs 



Thus, the efficient estimate of is the mean of the distribution. 
This result is not without interest, since we have seen in section 2.3 
that the second moment of the distribution is also a possible esti- 
mate of fjb, and hitherto no rational grounds have been advanced 
for regarding it as an inferior estimate. 

The method of maximum likelihood may be used whenever a 
distribution can be expressed in terms of parameters that are required 
to be estimated from the sample. Fisher ( 1936 a) has used it in 
genetical studies for estimating linkages. 

3.82. When an efficient statistic has a normal sampling distribution, 
the square of the standard error or variance in a large sample may 
be obtained from the second differential of the logarithm of the 
likelihood L. Generally, if k is the parameter, k the maximum 
likelihood estimate, and a% is the standard error of 
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where the square brackets [ ] signify “the mean for all possible 
sample values of the term contained therein/’ 

For the normal distribution, on diiferentiating equation (3.7) 
twice, we find 

N 

and the mean value of for all possible samples is the population 
value a^, whence 



This is an alternative derivation of the square of the standard error 
of the mean. 


G 



CHAPTER IV 


GOODNESS OF FIT AND CONTINGENCY TABLES 


4 . 1 . So far we have been using as expressions of differences between 
distributions, constants like the mean, standard deviation, and jSg, 
which summarise their chief properties or are estimates of their 
parameters. Except for populations of particular forms, these do not 
express all the features of the distributions. It is desirable to have 
some index which measures the degrees of difference between the 
actual frequencies in' the groups, and so compares all the essential 
features. Such is K. Pearson’s (1900) which we shall first use to 
measure the deviations of an experimental distribution from the form 
of some hypothetical population. Both must be grouped in the same 
way, and the theoretical distribution must be adjusted to give the 
same total frequency ; then if is the number of observations in any 
one group in the theoretical distribution, and 71^ is the corresponding 
number in the experimental one, 



K — ^s) 


2 


(4.1) 


where S is the summation over all groups. It will be appreciated 

$ 

that since (us — v^) is squared, all differences in frequency, whether 
positive or negative, add a positive amount to x^y further that the 
greater these differences are, the greater is x^ 5 if the two distributions 
are exactly alike, x^ is zero. 


Sampling Distribution of 

4 . 2 . In using x^ to test whether one distribution differs from the 
other, we must remember that because of random errors, x^ 
never (or hardly ever) be zero, and we need to know its sampling 
distribution so that we can tell the probability of an observed x^ 
being given by a random sample from the hypothetical population. 
As usual, if that probability (which is symbolised by P) is low enough, 
the x^ is said to be significant, and it is unreasonable to suppose that 
such a significant value could be a result of sampling errors alone. 
Subject to certain restrictions, the distribution of x^ is given by the 
equation, 


/// = h\^ 


. . (4.Z) 
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where df is the element of frequency between ordinates drawn at 
X and X + ^he frequency between x == o x = Xi being 

r 

df, k is a constant and g is the number of degrees of freedom, 

Jo 

This conception of degrees of freedom is not altogether easy to 
attain, and we cannot attempt a full justification of it here; but we 
shall show its reasonableness and shall illustrate it, hoping that as 
a result of familiarity with its use the reader will appreciate it. Here 
the number of degrees of freedom is the number of groups, modified. 
Clearly, since the error in each group in the experimental distribution 
contributes a positive amount to x^, the greater the number of groups 
the larger would x^ be expected to be, as a result of random variations 
alone, and account of this is taken through the quantity g. Further, 
it is often the practice to fit the theoretical distribution to the obser- 
vations by calculating constants from the sample, just as in the 
example of section 2.51 we fitted a normal curve by making its 
mean, standard deviation and total equal to those of the sample. If 
we wish to test the adequacy of the theoretical form, further account 
must be taken of the degree to which we have made it fit the obser- 
vations by this method. Suppose, in an extreme case, there were 
g' groups and we fitted a curve involving g' constants which were 
calculated from the data; then the two distributions would agree 
exactly and x^ would be zero because sampling errors would have 
had no play. To take account of this second factor, we must subtract 
from the number of groups (say g') the number of constants that 
have been determined from the data in fitting,* in order to obtain 
the degrees of freedom {g). Every constant so determined has the 
effect, from the point of view of x^, of reducing the number of groups 
by one, and the number of degrees of freedom may be regarded 
effectively as the number of independent groups remaining to*con- 
tribute to When only the totals have been made equal, ^ 

but in a case like the fitting of a binomial to the number of germinating 
seeds in Table 2.3, the theoretical distribution has been adjusted to 
make its mean and total both equal to the sample, and ^ — 2. 

One restriction under which equation (4.2) for the sampling dis- 
tribution of x^ applies is that no group contains very few indi- 
viduals; 10 is about the lower limit. To arrive at the distribution 

^ Provided these constants are determined by processes which satisfy the 
criterion of efficiency. 
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of experimentally, we should have to take many samples. Now the 
probability of any random individual belonging to the sth group 
(say) is vJN, where N is the total number in the sample, and conse- 
quently in all the samples, the actual numbers falling into the rth 
group (n^) will vary according to the binomial 



(this is a direct application of the binomial theory given in section 
2.z). If is not too small, this binomial approximates to the normal 
distribution, and it is on the assumption that fis is distributed 
normally that equation (4.2) is obtained. This is the reason why in 
using should contain much fewer than ten individuals, 

and to satisfy this in practice the tail groups, which often have very 
low frequencies, are combined. 

A second restriction is the usual one that all the individuals in 
the sample must be independent. Indeed, ^'^7 be used as a test 
of independence. 

In its sampling distribution, x^ ^^7 vary between zero and plus 
infinity, and since there is no question of negative deviations, and 
we are asking if the observed x^ really greater than zero, the 
value at which an ordinate cuts off a tail of 5 per cent, also lies on the 
5 per cent, level of significance. Since, in general practice, we should 
not test a very small value of x^ for the significance of its deviation 
from the average, we do not include the tail near = o in the test 
for significance ; this is in accordance with the principles discussed 
in section 3.32. 

There are two sets of tables of the probability integral of equation - 
(4.2). The first, calculated by Elderton, is included in Pearson’s 
Tables for Statisticians and Biometricians ^ and gives the areas of 
the tails beyond integral values of for different values of n\ where 
n' is one more than the*number of degrees of freedom (~ ^ + i). 
This was originally calculated for the case where the theoretical 
distribution is only made to agree with the data in respect of its total, 
and was wrongly applied to other cases for some time before the 
necessity for correcting to degrees of freedom (which was pointed out 
by Fisher in 19225) was realised. The other tables by Fisher (1936^2) 
give values of ^ lying on different levels of significance, and in these 
the degrees of freedom are used directly and are called n. 
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in that there is no assumption made as to the normality of the 
distributions being compared. 

First let us test the fit of the normal curve to the distribution 
of heights of men in Table i .5 ; the arithmetical operations are set 
out in Table 4.1, and it will be noticed that the tail groups have 
been lumped together, giving 14 groups. Three constants have been 
fitted (total, mean and standard deviation), leaving ii degrees of 


TABLE 4.1 



Frequencies 


(ns — Vs) 

n 

(% Vs)^ 

Vs 

stature in Inches 

Observed 

w 

Expected 

(vj) 

(«S - n) 

below 61 - ^ 

14*5 

11*8 

2*7 

0*229 

0*62 

61-5- 

17 

17-7 

- 0*7 

— 0*040 

0*03 

62-5- 

33 -S 

35-5 

— 2*0 

— 0*056 

0*11 

63 -s-- 

6i-s 

62*8 

- 1*3 

— 0*021 

0*03 

64-5- 

95*5 

96-7 

— 1*2 

— 0*012 

0*01 

65 *5- 

142 

130*1 

11*9 

0*091 

I *08 

66-5- 

137-5 

153-0 

- 15*5 

— 0*101 

1*57 

67-5- 

154, 

i 57 ‘i 

- 3*1 

— 0 • 020 

0*06 

68-s- 

141-5 

I4I 'O 

0*5 

0*004 

0*00 

69-5- 

116 

110*5 

5*5 

0 ■ 050 

0*27 

70-5- 

78 ‘ 

75*7 

2*3 

0*030 

0*07 

71-5- 

49 

45*2 

3*8 

0*084 

0*32 

72-5- 

28-5 

23*7 

4*8 

0*203 

0*97 

over 73.'S 

9-5 

17*2 

““ 7*7 

— 0*448 

3*45 

Total . . 

I 078 

I 078*0 

0*0 

— 

X'=8-S9 

»'=12 

.P=o-66 


freedom, and we wish to see if the ^ (8-59) resulting from those 
II can be attributed to random variations. Entering Elderton’s table 
at n' = 13 , we find P = 0*66, and this is so large that we might 
reasonably suppose the deviations to have arisen from errors of 
random sampling, and we say that the normal curve gives a good fit. 
Indeed, 66 random samples in 100 would have given a ^ equal to or 
greater than 8*59. In computing ^ the total of the fourth column 
(= o) checks the accuracy of the subtractions ; and the terms in the 
sixth column are the products of those in the fourth and fifth. 
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Similarly, the distribution of germinating seed in Table 2.3 gives 
a of 3*897 when the groups containing four or more seeds are 
combined; there are five groups and three degrees of freedom (the 
mean and total have been equalised), and entering Elderton's table 
at n! == 4, we find P = 0*41, showing that the fit is a good one. 

An experimental distribution of flower colours and the Men- 
delian expectations are shown in Table 4.2 (Wheldale, 1907). 
There are 7 groups, and as only the total has been found from the 
sample (the expected frequencies, being determined from Mendelian 


TABLE 4.2 


Colour of Flowers 

Expected 

Frequency 

Experimental 

Frequency 

Magenta 

ii8-34 

107 

Magenta delila 

39*44 

42 

Ivory 

52-59 

67 

Crimson 

39*44 

42 

Yellow 

17*53 

24 

Crimson delila 

13*15 

12 

White 

93*51 

80 

Total* 

374-00 

374 


theory, do not depend on the experimental data in any other way), 
there are 6 degrees of freedom (w' = 7); P = 0*13; 

hence the deviations of experiment from expectation cannot be 
regarded as significant. 

The Additive Nature of 

4 . 3 ^ A useful property of ^ is that several values can be added 
together, and if the degrees of freedom are also added, the total 
deviations of several distributions can be tested by entering the 
tables at the total degrees of freedom and finding the probability 
corresponding to the total For instance, for the three distributions 
we have just tested, the total ^ is 21 • 30, the total degrees of freedom 
are 20, and P for this value of and = 21 is o • 38 ; thus, taken alto- 

* The figures given in the paper add up to 373*98, and we have added 
0*01 of a unit to each of the largest groups to make the total correct. There 
are also two groups with expected frequencies of zero, but these must be 
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gather, the differences are still attributable to random errors. This is 
the correct way of combining the experiences of several distributions, 
and it is quite wrong to find an average P or 
To illustrate further this additive character of Table 4.3 has 
been compiled from some data of Mendel, quoted by Bateson (1913), 
and gives the numbers of round and angular peas from ten plants, 


TABLE 4.3 
Frequencies of Peas 


Plant Number 

Round 

Angular 

ni 

Total 

N 

Ratio 

nsJN 


I 

45 

12 

57 

0-7895 

0-47 

Z 

27 

8 

35 

0-7714 

0-09 

3 

24 

7 

31 

0-7742 

o-io 

4 

19 

10 

29 

0-655 2 

1-39 

5 

32 

II 

43 

0-7442 

0-00 

6 

26 

6 

32 

0-812 5 

0*67 

7 

88 

24 

1 12 

0-7857 

0-76 

8 

22 

10 

32 

0-6875 

0-67 

9 

28 

6 

34 

0 - 823 5 

0*98 

10 

25 

7 

32 

0-7812 

0-17 

Total 

336 

lOI 

437 

— 

iimi 

Expected 

327-75 

109-25 

437-00 


HI 


together with the ratios of the numbers of round peas to the totals. 
These ratios vary considerably, and we will see how far the varia- 
tions may be explained on the hypothesis that the peas from the 
ten plants are effectively ten random samples from an infinite popu- 
lation in which the ratio is 0*75 (this is the Mendelian expectation). 
We might calculate for each plant the quantity 

Deviation in Ratio from 0*75 
Standard Error 

and since the sampling distribution of each ratio is binomial, this 
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If the deviations were random, these ratios would be distributed ap- 
proximately normally with unit standard deviation, and none of the 
ten would be expected to exceed z-o. Alternatively we may calculate 
ten values of and since there are two frequency groups, each 
plant contributes one degree of freedom. In this simple case, ^ 
happens to be the square of the above ratio. 

Fisher gives the value of lying on the 0*05 level of significance 
as 3-841 for one degree of freedom, and as none of those in Table 
4.3 is as large as this, no individual plant differs significantly from 
expectation. Further, for all plants combined, the proportion of 
round peas is 0 * 768 9, giving a of o • 96, which also is insignificant. 

It may be, however, that the variability in the proportion of round 
peas from plant to plant is greater than can be explained by random 
errors, when all are considered together. To make this clear, we may 
imagine an extreme case in which the values of for all the plants are 
near the level of significance, but the ratio tiJN varies above and below 
expectation, so that the ratio for all plants combined is near expec- 
tation. Then, although no individual plant appears to differ signifi- 
cantly from expectation, it is unlikely that all would be so near the 
level of significance if it were not that the deviations as a whole were 
real, but in different directions, so that when added they average 
out. To test such a point we may add the values of giving a total 
of 5*30 for 10 degrees of freedom {«' == ii) so that P =: 0*87, and 
we conclude from the data that the plants do not vary significantly, 
and they may be regarded as so many random samples of peas. 
Had there been enough plants, instead of adding the values of 
we could have formed a frequency distribution of them, and have 
compared it with the theoretical form for one degree of freedom. 

The X® test is thus an extension of the ordinary method of finding 
the significance of a single binomial mean, and enables the information 
given by a number of them to be combined. 

4 . 31 . It follows, of course, that if a number of values of ^ and their 
degrees of freedom may be added and treated as one, a total ^ 
may be split up into parts, and each part may be tested separately. 

Contingency Tables 

4 . 4 . The problem of the sex-ratio of rats (in Table 3.5, p. 83) may 
be looked at in a different way. We are not concerned with the total 
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diets, but with the distribution of young in the four cells giving the 
two sex-ratios. If diet has had no effect on the sex-ratio, the 571 
observations would be expected to be distributed at random in the 
four cells, with the one restriction that they should add up to give 
the totals of the table. In the infinite population of tables with those 
totals, the probability of an observation falling in the "‘deficient’' 
group is 276/571, and that of it falling in the male group is 268/571, 
so that the probability of it falling in the “deficient” male square is 

268 276 

— X — > 

571 571 

and the expected number of individuals in that square is that proba- 
bility multiplied by 571 = 129 *54. Similarly, the other squares can 


TABLE 4.4 
Expected Frequencies 


Diet 

Males 

Females 

Total 

Vitamin B deficient . . 

Vitamin B sufficient . . 

129-54 

138-46 

146-46 

156-54 

276*00 

295-00 

Total 

268-00 

303-00 

571-00 


be filled as in Table 4.4; they are the frequencies that would be 
expected if sex were independent of diet and the individuals were 
distributed at random. We may now test to see if Table 3.5 differs 
significantly from Table 4.4 by finding for the four cells, and then 
the Pfor one degree of freedom. There is only one degree of freedom, 
since only one cell can be filled independently; the numbers in the 
others can be obtained from that one and the totals. In our example, 

^ 1*20 and P = 0*28 (from Fisher’s table); the deviations are 
not greater than can be attributed to random errors. It will be noticed 
that this probability is practically the same as that obtained previously 
from the standard error of the ratios (0*27); indeed, it should be, for 
both methods are equivalent, being based on the assumption that 
the number in a cell (or its ratio to the total) is distributed normally. 

Table 3.5 is an example of a fourfold contingency table. When the 
individuals in a sample have two characters, and a frequency table 
is made classifying them according to both so as to show the relation 
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between the characters, the result is a contingency table. In Table 3 .5, 
the t^vo characters are sex and vitamin B content of diet, and complete 
information regarding these for any one rat is given by its position 
in the table. If the numbers in the cells are not randomly distributed, 
the two characters are said to be associated] the sex of the offspring 
in our example is not associated with the vitamin B content of the 
diet of the parent. Contingency tables may be manifold, and if there 


TABLE 4.5 






SnvERiTV OF Attack 





Hsemor- 
• rhagic 

Confluent 

Abundant 

Sparse 

Very 

Sparse 

Totals 


O-IO 

(0-87) 

(5-13) 

6 

(9-02) 

II 

(8 -60) 

12 

(6-38) 

30 

Years 

10-25 

5 

(13-25) 

37 

(78-20) 

114 

(137-45) 

165 

(130-96) 

136 

(97*14) 

457 

since 

vaccina- 

tion 

25-45 

1 29 

(27-04) 

155 

(159-47) 

299 

(280 - 32) 

268 

(267-07) 

181 

(198*10) 

932 

over 45 

II 

(4-50) 

35 

(26-52) 

48 

(46-62) 

33 

(44-42) 

28 

(32*94) 

15s 

Unvaccinated . . 

4 

(3-34) 

61 

(19-68) 

41 

(34-59) 

7 

(32-95) 

2 

(24*44) 

Its 

Total 


49 

289 

508 

484 

359 

I 689 


are n rows and m columns, there are (n i) • i) degrees of 
freedom. 

T 4 )Ie 4,5 is a 5 X 5 contingency table, being Brownlee’s data of 
the severity of smallpox attack and degree of vaccination (quoted 
by Pearson, 1910); below the frequencies, the expectations for inde- 
pendence are given in brackets. 

As there are several expected frequencies less than ten in the first 
row and column, these have been combined with the second row and 
column in working out, so forming a 4 X 4 table, ^ ^ 196*33, 
there are 9 degrees of freedom, and P is less than o -ooo 001 ; hence 
the association between degree of vaccination and severity of attack 
is overwhelmingly significant. Notice, however, that this gives no 
information as to the strength of association; a smaller sample would 
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give a lower significance for the same association, and a larger sample 
would give a higher significance. Indeed, this test does not even tell 
us whether increased severity of attack is associated with a longer 
or a shorter period since vaccination. Having established, however, 
that the effect is real, we glance at the table again, and notice that 
for the more recently vaccinated patients there tends to be a deficit 
of severe attacks and an excess of milder ones; for the remotely 
vaccinated, the reverse is the case, and thus we establish the sense 
of the association; a measure of its strength will be discussed in 
section 9.5. Readers must be warned that association does not 
necessarily mean a causal relationship; it only means that in the 
population sampled there is a tendency for variations in the 
two characters to occur together. A much closer analysis is 
necessary to investigate causes; this also will be discussed later 
in section 7,3. 

The test of Table 4.5 for association may be looked at in another 
’way. The table consists effectively of a number of frequency dis- 
tributions of years since vaccination, and is the measure of the 
deviations of these distributions from hypothetical ones deduced 
from the “totals” column and differing only in total frequencies. 
The independent “constants” of these hypothetical distributions are 
four of the proportionate frequencies in the totals column ; the fifth 
may be obtained from the others, since they all add up to unity. 
In addition to these, the five totals in the last row give the five totals 
of the hypothetical distributions, so that altogether nine “constants” 
have been fitted, leaving 16 degrees for a 5 x 5 table. Alternatively, 
the table may be regarded as a collection of five distributions in 
which the variate is the severity of attack. 

For expressing and testing association, the contingency table is 
exceedingly useful, since the characters need only be described 
qualitatively, and need not be given in any particular order. A 
rearrangement of the rows or columns in Table 4.5 does not affect 
the result. 


4 . 41 . A useful application of the test is to the comparison of a 
number of frequency distributions, e.g. for the purpose of checking 
sampling technique. The separate distributions may be regarded as 
rows in a contingency table, ‘and if the is large enough, they are 
significantly different. As a special case of this, when there are two 



io8 METHODS OF STATISTICS 

distributions and g' groups in each, there are g' — i degrees of 
freedom, and the expression for becomes 


'■ = S- 




N, 




• • (4-3) 


where iVi and are the two totals (they need not be equal), 

and are frequencies in one corresponding group in 
the two distributions, 

and 5 is the summation over all groups. 

The grouping must, of course, be the same for both distributions 
under comparison. These tests are alternative to those involving 
only the means and standard deviations, but are more complete, since 
they compare the distributions in all respects. 


General Notes on the Distribution of 

4.5. The distribution of is of considerable general importance in 
statistical theory, and a proper understanding of its nature is desirable. 
We have introduced it as it first appeared historically, as a measure 
of the differences between two frequency distributions. More gener- 
ally, however, if we have any quantity x which is distributed normally 
about zero with unit standard deviation, and we add the squares of 
g independent values of x, the distribution of this sum in the infinite 
population is that of x^ for g degrees of freedom. 

Thus, the distribution of x^ is closely related to that of N times 
the variance, in samples of N from a normal population in which 
the standard deviation is unity. Indeed, the sum of squares of g inde- 
pendent deviations from the population mean is equivalent to ^ i 
deviations from the sample mean, so that if in equation (z. 8 ), p. 66 , 
we write 

N = g+i, s^ = ~ — and cr = i 

in the distribution of the equation reduces to the form of ( 4 . 2 ). 

Tables of the probability integral of x^ only go up to ^ = 30 , but 
for larger values of g it is sufficient to assume V zx^ to be normally 
distributed about a mean of \/zg ^i with unit standard deviation 



GOODNESS OF FIT AND CONTINGENCY TABLES 109 

(Fisher, i936<2). Notice, however, that here we are asking not *‘is 
the observed s / significantly different from the mean ^/zg — t ?” 
but rather ^‘is the observed V z^ significantly greater than zero?’’ 
and consequently the deviation of V z^ which has a tail 5 per cent, 
of the whole curve is equivalent to the 5 per cent, level of signifi- 
cance. Such a deviation is 1-65 times the standard error, and 
^ — I has to be greater than 1*65 to be significant. 
When g = 30, ^ lying on the 5 per cent, level on this assump- 
tion is 43 'Sj whereas Fisher’s tables based on the correct distri- 
bution give 43*773, so the approximation is close enough for large 
samples. 



CHAPTER V 


SMALL SAMPLES 

It is not always possible to obtain large samples, such as are suitable 
for application of the methods of the last chapters, so the theory has 
been developed to give more exact methods suitable for small num- 
bers, We stated in section 3.1 that there is nothing in the definition 
of a random sample that depends on its size, and would reiterate 
this in view of the existence of a fairly common impression to the 
contrary, of which the following quotation is typical. 

“Moreover, in agricultural experiment, the number of observations 
rarely, if ever, is sufficiently large to allow full play to the laws of 
chance.’’ 

This is a misconception, for the laws of chance (as we know them) 
have full play every time a single individual is drawn from the popu- 
lation, and the theory of random sampling may be applied to small 
samples. There are two kinds of modification of the theory for large 
samples. The first consists in developments to the theory of esti- 
mation, which do not alter the general principles described in 
section 3,7. We need not concern ourselves with them any further. 
The second kind of modification is of practical importance and 
consists in using exact sampling distributions instead of the approxi- 
mate ones used with large samples. We shall deal with this more fully. 

Unfortunately, although the limitation of size has been removed 
that of form has not, and most of the following results apply only to 
quantities distributed normally. 

Variance Estimated from Small Samples 
5 , 1 . With a few observations, it is futile to form a frequency distri- 
bution, but the usual frequency constants may be calculated, and 
regarded as estimates, obtained from the sample, of the constants of 
ffie infinite population. For the normal population, the mean is found 
in exactly the same way as for large samples, but the best estimate 
of the variance (a^) is obtained by dividing the sum of the squares 
of the deviations from the mean, not by the number of observations, 
but by the number of degrees of freedom. Here the number of degrees 
of freedom is the number of deviations minus the number of con- 
stants determined from the sample and used to fix the points from 
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which those deviations are measured ; in the simple case, "when the 
mean only is found from the sample, the degrees of freedom are one 
less than the number of observations. We will illustrate this by the 
data of Table 1.3, which are the 100 random observations from an 
artificially constructed population, divided into groups of 5. We may 
find the variance either from the squares of the deviations from 
the grand mean or, regarding each group of five as a small sample, 
from the squares of the deviations from the sample means. For 
the latter process, instead of finding the variance for each sample 
separately, we may sum the squares of all the deviations and divide 
by the total degrees of freedom contributed by the ten samples. 
The sum of squared deviations from the grand mean is 8 864*75, 
and on dividing this by 99 we obtain the variance, viz. 89*5. The 
sum of squares from the sample means is 7056*4, and since each 
sample contributes four degrees of freedom, there are 80 degrees 
altogether and the variance is 88*3. There is quite a fair agreement 
between the two estimates.* If we had used the old method for 
large samples we should have obtained values 


8864*75 , 

= 88*6 

100 


and 


7056-4 

100 


70-6, 


with a much poorer agreement. In a large sample, the difference 
between dividing by N and (AT — i) is quite unimportant. 

As a special case, it may be determined that the mean variance for 
a number of pairs is 

S{xi — 

zM 

where x^ and x^ are the individuals of any pair, M is the number 
of pairs and S is the summation over all pairs. In this way," the 
variance of any character between brothers from the same parents can 
be obtained as accurately from pairs of brothers from a hundred 
families as from a single family of one hundred and one brothers. 


5 . 11 . The justification for using the new estimate when the devia- 
tions are measured from the mean arises mathematically from the 

* We have not tested the agreement by the ordinary sampling theory, 
using the standard errors of the values, since the values are not independent; 
they have been taken from the same data. 
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sampling distribution of the variance. If v' is the sample value of 
the variance as defined in section 1.23 and 5 is the corresponding 
standard deviation, 


s = '\/ ds 


I dv ' 


2 v'- 


V 


and the sampling distribution of may easily be derived from that 
given for s in section 2,72; it is 

N - 3 W 


where is a constant and iV is the size of sample. The mean 
value of v' obtained by integrating this distribution is 


AT- 


iV 


Thus, as an estimate of t?' tends on the average to give a biassed 
value that is slightly smaller than the population value. It is prefer- 
able in dealing with small samples to use the unbiassed estimate, 

Nv' S{x — ^ 

® - W^x ^ N-i 


Significance of Means: The t Test 

5 . 2 . When testing the significance of the deviation of a sample mean 
from an assumed population value, we used the fact that the ratio 
of the deviation to its standard error is distributed normally with 
unit standard deviation.* If d is the deviation, or is the population 
value of the standard deviation and N is the size of sample, this 
ratio is 

d 

a 

Vn 

Now, in practice we do not always know a and have to substitute 
the sample estimate, s. When dealing with large samples, s and o 
are sufficiently alike to make this course reasonably accurate; but 

• The ratio is the w of equation (2.5), section 2.51. 
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when the samples are small, some £Iowance must be made for the 
error involved in using s instead of a. 

The ratio by which we test the significance of a deviation may 
be written 

t ^ 
s 

and if we know the sampling distribution of t, we may carry out 
the test exactly, without being limited to large samples. This was 
first pointed out by “Student” (1908) and he gave some tables of 
the sampling distribution of a quantity z closely related to t. These 
have now been generally superseded by tables of t given by “Student” 
(1925) and Fisher (1936a). When computing t for use with Fisher’s 
tables, ^ must be taken as the square root of the variance calculated 
correctly from the sample by dividing the sum of squares of devia- 
tions by the degrees of freedom. Thus, 

^ Sx ^ ^ ^ 

d = ^- i-x- ^ 


where | is the population mean. 

The sampling distribution of t is symmetrical and depends only 
on the number of degrees of freedom on which s has been estimated ; 
it approaches the normal form as the degrees of freedom increase. 
Fisher’s tables give, for different degrees of freedom, called n, values 
of t lying on various levels of significance, the levels being the sum 
of the two “tails” of the distribution as for the third column of our 
Table 2,5. 

For large s topics, f = 2*0 lies on the 0*05 level of significance; 
other values that lie on the same level are : ^ == 12*7 for samples 
of z (n== i), t = 4-3 for those of 3 (tz = 2), t = 2*8 for those of 4 

= 3) and 2*2 for those of 10 (n — g); the effect of the non- 
normality of t is serious for samples much smaller than 20. 

Table 5.1 shows the effect of a small electric current on the 
growth of maize seedlings, giving the difference between the elonga- 
tion of the treated and untreated in parallel pairs of boxes (data from 

H 
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Collins, Flint and McLane, 1929), a positive difference showing that 
the electrical treatment increased the rate of grovrth. The mean is 
4-29 mm.; is this significantly different from zero? The sum of 
squares of deviations from the mean is 937-189, and since on our 
hypothesis | = o, 

“-33, 

and = 9; according to the tables, if = 1*38 lies on the o-a level 
of significance, and there is thus no evidence from the sample that 
the treatment has made any difference to growth. The mean elonga- 
tion is not large enough compared with the variations between those 
of the separate boxes to be significant. 

TABLE 5.1 

Elongation in Mm. (Treated — Untreated). 

6‘o 

1*3 

10*2 

23*9 

3*1 

6'8 

- I’S 

- 14*7 

3*3 


Mean 4-29 

Essential Character of t 

5 . 21 . Essentially, t is the ratio of a quantity distributed normally 
about a mean of zero, to an estimate based on n degrees of freedom, 
of the standard error of that quantity. Any such ratio, however it 
arises, has the same sampling distribution as t. 

Significance of Differences between Means 
6 . 3 . The quantity t may be used for testing the significance of the 
difference between two sample means. We shall deal here only with 
the situation in which the assumption is made that the samples 
are from populations having a common standard deviation a as well 
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as the same mean.* Then, if iVi and iVg are the numbers in the 
two samples, and and are the sample means, (% — x^) is dis~ 
tributed normally about zero and an estimate of its standard devia- 
tion (or error) is 



where s is obtained by summing the squares of the deviations from 
the two sample means, dividing by the total degrees of freedom 
and finding the square root. Thus 

2 ^1(^1 *^2(^2 ^3)^ 

(N,^i) + iN,-^i) • • • ( 5 . ) 

where and S2 are the summations over the tv^o samples and 
A?! and X2 are individuals in the two samples. Then 



which in large samples is distributed normally with unit standard 
deviation, is in small samples distributed as Fisher’s t, the degrees 
of freedom being 

n = (iV,-~i) + (iV,-i). 

Our example is from data provided by Corkill (1930) showing the 
effect of insulin on rabbits ; the results for separate animals are given 
in Table 5.2. The difference in means is not large compared with 
the variations within each sample, and a statistical test of signifi- 
cance is necessary. The sums of squares of the deviations are 0*253 ^ 
and 0*071 5, and thus 

0*^24 K 

= 0*017 o8 aud ^ = 0*1307; 

19 

^ Since normality is also assumed, the hypothesis and assumptions 
amount to the composite hypothesis that the two populations are one. 
Judging from experience, however, the tests are not very sensitive to 
moderate departures from normality nor to small differences in standard 
deviation. 
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there are ten and eleven rabbits in the two samples, so 


TABLE 5.2 


Muscle Glycogen (per cent.) 

Controls 

After Insulin 

0*19 

0*15 

o*i8 

0*13 

0-21 

Trace ^ 

0*30 

0-07 

0*66 

0-27 

0*42 

0-24 

o*o8 

0*19 

O’ 12 

0’04 

0*30 

o*o8 

0*27 

0*20 


0-12 

Means 0*273 

0-135 


* Assumed to be o • 00. 


The mean variance has been calculated on 19 degrees of freedom, 
and for these conditions, t = 2-09 lies on the 0-05 level of signifi- 
cance; thus the effect of insulin on muscle glycogen appears to 
be real. 

The difference between this and the earlier example of the effect 
of electrical treatment on growth is that here there is no reason for 
taking controls and treated animals in pairs, they are all independent ; 
in the former instance, boxes were treated in parallel pairs, and there 
was some reason for expecting that these would be subject to some 
of the same disturbing factors, which thus would not affect the 
differences. 
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Significance of Differences between Variances 
6 . 4 . When testing the difference between variabilities for large 
samples in section 3.53, we assumed — ^2)) difference in 
standard deviations of the samples, to be distributed normally with 
a standard error of 



where cr is the standard deviation in the population from which the 
samples are presumed to be taken ; and not knowing <j, we substituted 
$1 and ^2 in the expression. Errors arising from this approximation 
are again important in small samples, but Fisher (1924a and 1936a) 
has suggested a new index: 

« = -Hlog.4-Iog,s|) = log,^. . . . (5.4) 

where sf and 4 are the two variances calculated on the degrees of 
freedom. For samples of and drawn from the same popu- 
lation, if and Wg are the numbers of degrees of freedom, z is 
distributed in the form 

^iz 

where n-^ and TXg degrees of freedom (= iV^ — i and — i) 

and is a constant.* This distribution, containing as variables 
only z, Hi and is independent of the standard deviation of the 
population, <7, and since it involves no approximating assumptions, 
is applicable to small samples. The quantity z may vary between 
plus and minus infinity, being negative when sjs2 is less and posi- 
tive when is greater than unity, and unless = n^y is skew. 
The positive part of the curve of ^ = sjs^y however, is the same as 
the negative part o£ z == and so the probability integrals for 
positive deviations only are sufficient for any combination of degrees 
of freedom, the others can be obtained by interchanging and n2* It 
is simpler, however, not to deal with negative values of z, but always 
to take the difference of logarithms so that it is positive, and hence to 
choose Hi to be the degrees of freedom on which the larger variance 

^ This distribution of z is related to that of s and hence to that of x® (see 
Fisher, 1924a). 



METHODS OF STATISTICS 

is measured. Fisher (1936^) gives values of at which ordinates cut 
off “tails’’ of 5, I and o-i per cent, of the total area of the curve 
’for values of and ^2 chosen according to the above convention.* 
These points, however, are not the corresponding levels of signifi- 
cance as we understand them. The hypothesis is that the two variances 
are estimates of one and the same population value, and that z 
(which is the difference between the natural logarithms of the two 
standard deviations) is zero. Applying the arguments of section 3.32, 
we regard as lying on the 0-05 level of significance values of z at 
which ordinates cut off tails, each of which is about 0*025 
whole area; the 5 per cent, level of significance, therefore, lies some- 
where between the 5 and i per cent, points of Fisher’s tables, unless 
there is some a priori reason for supposing the variance estimated 
by is either equal to or greater than that estimated by s\ (com- 
pare p. 77). 

When til and are large, or when they are moderate and nearly 
equal, the distribution of z becomes nearly normal with a standard 
error of 



and in such circumstances a value of z greater than twice this value 
lies above the 5 per cent, level of significance. For example, when 
n^==zft2 = 34, this standard error becomes 0*204, and a 3' of 0*408 
lies on the 5 per cent, level (that is between 0*342 5 and 0*489 0, 
at which according to Fisher’s tables P = 0*05 and o*oi). In order 
to check the assumption of normality, we may compare the deviation 
at which an ordinate cuts off a tail of 5 per cent, of the whole curve 
(1*65 times the standard error = i *65 X 0*204 = 0*337) with 
0*342 5; the agreement is quite good. 

For the heights pf Englishmen and Scotsmen of Table 3.1, p. 74, 
the standard deviations are 2*548 and 2*480, so 2 = log^i*027 
= 0*026 6, and using the above approximation its standard error is 

K H 1 ) = 0-021 5; 

M 2\6 194 I 3047 

* Mahalanobis (1932) gives similar tables for the 5 and i per cent, points 
of $ils^ and 5 ^/ 4 , and by using these, the rather troublesome transformation 
of equation (5,4) may be avoided. Tables Di and Dz at the end of this 
book give corresponding values of logio 4/4* 
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z is only 1-23 times its standard error and is not significant. The 
other test in section 3.53 gave the difference in standard deviations 
as 1*27 times its standard error, and this shows that in very large 
samples it is as well to use the difference between standard devia- 
tions as z. 

As an illustration of a small sample, we will test the difference in 
variability of muscle glycogen between the controls and insulin- 
treated rabbits of Table 5.2. The sums of squares are 0*253 ^ 
0*071 5, the variances are 0*028 ii and 0*007 15, z = i'log^3*93 
= 0*684, % = 9 ^2 = 10, 

Fisher’s tables do not give z for these values of and but 
interpolation is made easy by the fact that for any one value of n2y 
changes in z are nearly proportional to ifui. Linear interpolation 
with respect to is therefore appropriate. We shall write z^ and 
z^ for values on the 5 and i per cent, points. 

Then, for == 10, 

when ~ 8, 

— = 0*12500, 2r5 = o*56ii, and 2'i = 0*8104; 

when = 12, 

— = 0-08333, 0-5346, and ^1 = 0-7744; 

and the differences are : 

= 0-041 67, diff.{z^) = 0-026 5, and diff.{z^ = 0*036 o. 
Hence, when iti = 9, 

I ^ 0*01389 . 

— = o* III II, 2^5 = 0*561 I X 0*026 5 = 0*552 3, 

» 5 0 0*04167 ^ 

, ^ 0*01389 , ^ 

and 2^1 = 0*8104 X 0*0360 = 0*7984. 

^ ^ 0*04167 

The value z = 0*684 bos between the 5 and i per cent, points, but 
it is doubtful if it is on the 5 per cent, level of significance, and we 
can only say that although the data suggest that the effect of insulin 
has been to make the muscle glycogen percentages more regular, 
further observations are necessary to establish the fact. 
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Interpolation with respect to is scarcely ever necessary, since 
the tables are given for Wg increasing by units between i and 30. 


Significance of Variations between Several Samples 
5 . 5 . The method of section 5.3 tests the hypothesis that two samples 
are from populations having the same mean, assuming they have 
the same standard deviation. When there are more than two 
samples, the hypothesis may best be tested by the analysis of 
variance described in the next chapter. 

It is sometimes useful to test the significance of the general 
differences between the variances of more than two samples. For 
instance, it may be desired to combine data from several experi- 
ments or sources, and before doing so, it is advisable to ascertain 
that the variance within each set of data is sensibly the same; or 
again, when examining the products of a factory statistically, any 
difference in the variability of the articles produced on different 
machines, sections, or times is taken as evidence of lack of control. 

Neyman and E. S. Pearson (1931) have proposed an index for 
testing this. If there are k estimates of variance, ©1 
based on N observations (n = N — 1 are the degrees of freedom), 
the index is* 




(glPg . . . 

+ V2+ ... Vk) 


• ( 5 - 5 ) 


Thus, Li is the ratio of the geometric to the arithmetic mean of the 
variances. Mahalanobis (1933) and Nayer (1936) have tabled values 
of lying on the 5 and i per cent, levels of significance for various 
values of k and N.^ Nayer ’s tables are fuller than the others. One 
property of Lj is that it decreases in value as the variances differ 
more among themselves; it has a maximum value of unity when 
the variances are equal. 

In a series of six experiments (Tippett, 1934), the following six 
variances of cotton yam breakages were obtained: 0’3056, 0-5578, 
0-6489, 0-3378, 0-6194 and 0-4311; each was based on 9 degrees 

• Our notation differs from that of Neyman and Pearson. 

t Mahalanobis and Nayer write n for the number of observations in each 
sample, corresponding to our N. In applying this test, we shall consistently 
use the small letter for degrees of freedom. Nayer writes / for our n. 
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of freedom. Before combining the results of these experiments, it 
was considered necessary to test the variances for homogeneity. 
Here, k= 6 y N = lo and L;^ = 0*930. From Nayer’s tables, 
= o*8i lies on the 5 per cent, level, and remembering that a 
high Lj corresponds to greater imiformity of variance, we conclude 
that the value of 0*93 is not significant. 

Small Samples from the Binomial and Poisson Distributions 
5 . 6 . In Chapter II we used the binomial and Poisson series to test 
the homogeneity of a large number of counts of seeds and yeast 
cells ; but it is not always possible to obtain such large samples from 
one population. However, if there are small samples of replicate 
counts from a variety of populations, the experience so obtained can 
be reduced to a common measure and added. 

5 . 61 . We will develop the tests for the binomial by reference to 
Table 4.3, The numbers of round and angular peas and the totals 
may be regarded as forming a 2 X 10 contingency table, and obtain- 
ing the expectations from the totals (i.e. ignoring the Mendelian 
expectation), it may be tested for randomness by means of the 
calculated on 9 degrees of freedom as shown in section 4.4, If 
this value of x^ is significant, the data are not random nor homo- 
geneous. If there are several such tables obtained (say) from a 
number of varieties of peas in which the expectations are not neces- 
sarily equal, all the values of x^ a^^d their degrees of freedom may be 
added, and the sum tested in the usual way. Further, there may 
be enough tables with the same number of degrees of freedom (say 
a hundred or more) to form a frequency distribution of x^> a.nd this 
may be compared with the theoretical form obtained from the 
standard tables. The constant x^ is thus the common measure to 
which a variety of experiences may be reduced. 

The expression for x^ becomes particularly simple if the number 
of peas from each plant is constant. Using more general language, if 
there are g' parallel sets, each of n trials, with tn^ . , . successes 
in the sets and a mean of m successes,* then 



* In Table 4.3 the g' sets are the 10 plants, the n trials are the total peas 
on each plant (we are now dealing with the case where n is constant), 
and the . . . successes are the numbers of round peas, 
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and there are {g' — i) degrees of freedom. This, like the obtained 
in section 4.3 under a slightly different assumption, is a measure of 
the variation in between plants. If we had a number of such 
sets of 10 for different kinds of plants, we could find the for 
each set, and compare their distribution with the theoretical one 
for 9 degrees of freedom. In the same way, for instance, the uni- 
formity of conditions in seed germination can be investigated from 
a large experience of many sorts of seed, as long as there is the 
same number of replicates {g') on each seed. 

In one series of experiments made to find the effect of various 
treatments on cotton seeds, 100 seeds were exposed in two lots of 
50 for each treatment. In such a case, if nty and are the numbers 
germinating in the two lots of 50, 

o (ffli - m^f 

nu + 


+ mJ i 


100 


and there is one degree of freedom (g' = z). There were 52 treat- 
ments, giving a total of 82 • 84. The tables do not extend to so many 
degrees of freedom, so we may use the approximation mentioned in 
section 4.5, assuming 

V2X^—V2g — I 

to be normally distributed with xmit standard deviation. Here g = 52, 

and 

V 2 x^ ^ V zg — I = 2-72, 


corresponding to P = o • 003 ; this suggests that there must have been 
some lack of uniformity in conditions. The frequency distribution 
of and the theoretical form as found from Fisher’s table for on6 
degree of freedom are compared in Table 5.3; as Fisher gives 
only a few probability integrals, the intervals of x^ ^^re unequal. 
There are obviously too many pairs of counts with large values of 
X^, and to test the divergences we must use the rather overworked 
constant, again; but as shown in section 4.1, calculating a x^ 
according to equation (4.1) as a measure of the differences between 
the two distributions of Table 5.3.* In doing this, we have com- 

* This double use of x® may be a little confusing, but if readers can for 
the moment forget that the variate of Table 5.3 is x^j and just regard the 
table as two distributions, the differences between which are to be tested 
by calculating x'^f ^11 will be welL 
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bined the groups in pairs according to the brackets, and obtain 
^'2 = y.86, which for three degrees of freedom (i.e. four combined 
groups and only the total of the theoretical distribution has been 
adjusted) lies almost exactly on the 0-05 level of significance. This 
test of randomness is less stringent than the one above adding all 
the values for for it groups all those over 2-706 together, and 
takes no account of how much greater they may be. 


TABLE 5.3 



Frequencies 

Expected 

1 Observed 

i 

0-0*015 ^ 

5*2 

si 

1 

0*015 8-0*0642 

5*2 


1 

0*064 2-0*148 

5*2 

31 

1 

0*148 ”0*455 

10*4 

lOl 

1 

0*455 ~i*o 74 

10*4 

lol 

[ 

1*074 “'1*642 1 

5*2 

sj 

1 

I • 642 “2 • 706 

5*2 

8l 

i 

over 2*706 

5*2 

j 

loj 

1 

Total 

52*0 

52 


5 . 62 . For the Poisson distribution, the index corresponding to that 
of equation (5.6) is 

2 ^ S{m,- mf 
m 

where is any individual count, m is the mean, and S is the suin- 
mation over all counts. Fisher, Thornton and Mackenzie (1922) 
first used this to test the accuracy of bacterial counts and found 
significant deviations from expectation. More recently, Smith and 
Prentice (1929) have used it to check their technique of cyst counts 
in soil. They had 73 sets of 10 counts from different samples of soil, 
and calculated the 73 values of The distribution of this is com- 
pared with the theoretical one for 9 degrees of freedom below in 
Table 54, and the expectations, being obtained from Fisher’s book 
(1936^), are for unequal intervals of 
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TABLE 5.4 



Frequencies 

Expected (Vs) 

Observed (ns) 

0“ 4* 168 

7*3 

7 

4-i68- 5*380 

7*3 

7 

5-380- 6-393 

7*3 

6 

6 - 393 - 8-343 

14*6 

22 

8-343-10-656 

14*6 

18 

10 •656-12*242 

7-3 

2 

12 -242-14*684 

7*3 

5 

over 14*684 

7*3 

6 

Total . . 

73-0 

73 


The agreement is quite fair; it is tested in the usual manner 
calculating another 




9-604, 


which for seven degrees of freedom gives P greater than 0-2. 




CHAPTER VI 


THE ANALYSIS OF VARIANCE 

Previous chapters are based on the presentation of collections of 
data in the form of frequency distributions and on the description 
of a few features, particularly the extent and form of the variation, 
by frequency constants. This is a process of summarisation in which 
some detail is inevitably lost. We shall now reverse the process and 
deal with statistical methods designed to recover some of the detail 
that is of value, starting from the frequency distribution and its 
constants as bases. Comparatively little has been done in this 
direction with the form of variation, but the amount of variation 
can be split up into parts associated with different causes or sources. 
The method of analysis of variation is introduced in its simple 
quantitative form in this chapter; and most of the subjects dealt 
with in subsequent chapters are developed from the same basic 
method. As a first step, we shall in the next section prove a funda- 
mental mathematical property of variance. 

Variance an Additive Quantity 

6.1. Variability may be considered to arise from a multitude of 
causes producing small deviations; but it often happens that to these 
there are added larger deviations due to a few more important 
causes, and the variation is not always homogeneous. It is a funda- 
mental property of that measure of the degree of variability, the 
variance, that it is additive, i.e. if a quantity is subject to the operation 
of several independent causes each of which contributes a certain 
variance, then the final variance of the quantity is the sum of those 
due to the several causes. Let it be assumed, for example, that a 
quantity x is subject to random variations, and to others associated 
with two factors A and B; then the value of any one observation 
of X is 

= I + a + ^ + f , 

where | is the mean, a and ^ are the deviations arising from A and JS, 
and I' is the random deviation. The square of the deviation of x 
from its mean is 

(x ^ 1)2 = a 2 + ^2 + ^'2 + 
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and this may be summed for a sample of N individuals, and divided 
by the degrees of freedom (N in this case, since we have not foimd 
the mean | from the sample, but have assumed it). Thus we obtain 

S(x-Jf Sa^ . Sr , zSpi' 

— N ~N^ n'^N^N^N^N- 

and as N becomes indefinitely large, the last three terms of this 
equation tend to zero if a, ^ and $' are independent; the other terms 
are the squares of the standard deviations or variances, so that 
finally 

0 '*^= 

Hence the variance of x is the sum of the random variance and of 
those due to A and B. We used this property in Chapter III to 
find the standard error of a difference in- terms of those of the two 
means; since the variance is the square of the standard error, the 
above equation leads directly to equation (3.1).* We shall use it now 
to analyse the variability of quantities into parts. 

Analysis of Variance 

6 . 2 . Consider Table 6.1 in which the data (Harris, 1910) are 
frequency distributions of ovaries containing different numbers of 
ovules, and are for ten separate shrubs of the American Bladder 
Nut. The separate columns are called arrays. It is obvious that there 
are considerable differences between the shrubs, for while shrub 1 1 
has ovaries with 22-30 ovules, shrub 13 has ovaries with between 
17 and 24 ovules, and the other shrubs show similar differences. The 
variations between ovaries on any one shrub are less than those 
between all taken together, and the whole table suggests that we 
may legitimately divide the total variability into two parts: one 
associated with differences between ovaries from the same shrub, and 
another with differences between shrubs. We say that there is an 
association between the shrubs and ovules per ovary, and as evidence 
adduce the fact that the mean variance of deviations from the shrub 
means as found by the method of section 5.1, which is 3*057, is very 
much less than the variance of the “totals” column (5*385). 

Indeed, Table 6.1 is really a manifold contingency table, but the 
association is expressed differently because one variate is quantitative ; 

• a and ^ may be positive or negative. 
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a treatment by the method of the contingency table would be very 
laborious with so many squares. 

We may present the analysis of the variability into the two parts 
in a systematic manner. Let x be any individual reading, ^ • 

x^ , . . x^ the m shrub means, x the grand mean, and n the number 
of readings per shrub, so that there are nni = N readings altogether. 

TABLE 6.Z 


Frequencies of Ovaries (Series O, 1908 C). 







Serial Number of Shrub 




















II 

12 


15 

16 

18 

19 

20 

21 

22 



17 

18 

— 

2 

I 

^3 


1 

6 

I 


— 

1 

1 

34 


19 

— 

5 

13 

— 

I 

4 

5 

5 

— 

I 

34 


20 

— 

8 

27 

— 

I 

4 

I 

5 

2 

2 

50 

>> 

21 

— 

10 

4 

I 

I 

5 

2 

7 

I 

1 

32 

a 

22 

4 

21 

18 

— 

7 

13 

9 

12 

6 

, 2 

92 

6 

23 

5 

17 

7 

3 

9 

16 

•IS 

25 

8 

15 

120 

te 

cu 

V 

24 

41 

35 

7 

16 

42 

48 

44 

39 

61 

68 

401 

a 

> 

^5 

21 

2 

— 

25 

14 

3 

i8 

6 

10 

8 

107 

0 

. 26 

9 

— 

— 

21 

9 1 

I 

4 

I 

4 

2 

51 


27 

9 

— 

— 

8 

5 i 

— 1 

I 1 

— 

3 

— 

26 


28 

4 


— 

8 

3 1 

1 

— 1 

— i 

2 

— 

17 


29 

3 


— 

12 

3 

— 

— 

; 

2 

1 

20 


30 

31 


— ' 

— I 

s 1 

I 

4 

— ^ 

— 

— 

I 

— : 

14 

1 

Totals . . 

100 

ICO 

ICO 

100 

100 

100 

100 

100 

100 

100 

I 000 


Then, for any ovary on any one shrub, the deviation from the grand 
mean is the deviation from the shrub mean plus that of the shrub 
mean from the grand mean : 

(x-- x) = (x — X,) + — x). 

We will square these deviations, and sum for all observations from 
the one sl^ub ; denoting this summation by S' and applying the 
rules in section 1.311 we obtain 

S'(x — xf = S'{x — + n{x^ — xf + 2{x, — x)S'{x — 
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The term (x^ — x)^ is constant for the one shrub and its sum for 
the ft observations is n times the single value (the second term in 
the above equation); the third term above is zero, since the sum 
of the deviations from the mean, S'(x — ^^5), is necessarily zero. To 
obtain the total sum of squares, we now sum the terms of this 
equation again for all shrubs (using the sign S) and finally obtain 

SS'{x — x)^ = SS^x — Xs)^ + nS{Xs — . . , (6.2) 

The sign SS' is equivalent to 5 , the simple summation over all 

observations in the sample. For the data of Table 6.1, equation 
(6.2) is 

5 379-775 = 3 026-350 + 2 353-425. 

and each of these terms is given an appropriate place in Table 6.2 
of the Analysis of Variance. The terms are the sums of squares of 
deviations, (a) of individual observations from the grand mean, 
(b) of individual observations from the shrub means, and (c) of the 
shrub means from the grand mean (the sum being multiplied by 
n in this case). The degrees of freedom are given in the table ; for 
the total, one mean has been foimd and there are thus 999 degrees; 
for the deviations from the shrub means, each shrub contributes 
99 degrees, giving a total of 990 ; and for the shrub means, 10 devia- 
tions are measured from the grand mean, leaving 9 degrees of free- 
dom. The sums of squares and degrees of freedom of the parts should 
add up to give the “total,” and the sums of squares, divided by the 
degrees of freedom, give estimates of the variances. That for the 
“total” is the ordinary variance of all the observations in the sample, 
and that for “within a shrub” is the variance of the intra-shrtib 
deviations found after the method of section 5.1, and is an estimate, 
based on 990 degrees of freedom, of the true variance, cr,® (say).' 
The variance “between shrubs” is n times the square of the standard 
deviation of shrub means. To investigate this more fully, let us 
suppose we have an indefinitely large number of ovaries from each 
of an indefinitely large number of shrubs, and let the variance of 
these shrub means be this is the true value for the infinite popu- 
lation. If now we have only n ovaries from each shrub, the means are 
subject to random errors due to the intra-shrub variation, and their 
standard error is cr,/V^. These errors increase the variability of the 
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shrub means, and the resulting variance may be obtained by adding 
the two components, as shown in equation (6.1), giving 



the variance of Table 6.3 is an estimate based on 9 degrees of 
freedom of n times this. If is the variance between shrubs, and 
the variance within a shrub, as found from the sample 


cr,.2 


• (6.3) 


These two relations are the basis of the expression of association. If 
the variation between shrubs is relatively important, is large 


TABLE 6.2 
Analysis of Variance 


Source of Variation 

Sum of Squares 

Degrees of 
Freedom 

Variance 

Between shrubs 

2353*42^5 

9 


Within a shrub 

3 026*350 

990 


Total 

5 379*775 

999 

5-385 


compared with and the two estimates and % are very diiferent. 
If the shrub variation is zero, and tends to equal In 

such a case, for normally distributed variates, and are two inde- 
pendent estimates of the same variance, 0-^^. They are subject to the 
same random errors as are all estimates of variance, and the sig- 
nificance of any difference should be tested by the methods of 
section 5.4. For Table 6.2, the one (361 *493) is so much greater 
than the other (3 *057), that the reality of the association needs no 
test. Using the relations of equation (6.3) 

261*493 — 100 CTs^+ 

3*057 -->( 7 , 2 , 


whence 3 • 584 — 2. 

* The sign -> denotes that the quantity on the left is an estimate of that 
on the right, and that the former approaches the latter as the size of the 
sample (number of degrees of freedom in both parts) increases indefinitely. 

I 
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If a sample of ovaries were taken at random from these shrubs, the 
number from each shrub being left to chance, the variance would 
be of which an estimate is 5-641. This differs slightly 

from the variance in the “total” row of Table 6.2 because the 
sample to which the latter refers is not entirely random; it has been 
arranged so that 100 ovaries come from each shrub. 

Table 6.3 summarises the above relations and sets out the analysis 
of variance for the general case of n observations on each of m arrays. 
The variance within an array is often called the residual; having 


TABLE 6.3 
Analysis of Variance 


Source of Variation 

Sum of Squares 

1 

Degrees of Freedom 

Variance 

■ Between arrays 

nS(xs — x)^ 

$ 

m — 1 


Within an array 

SSXx - 
“ 1 

m(n — i) = N — m 

^ ^ (j 2 

r f 

Total . . 1 

Six - xy 

mn — I = iV — I 

— 


performed the analysis, we are not very interested in the variance 
of the “total” row, since the two parts contain all the useful informa- 
tion. 

Computation 

6 . 3 . When computing the sums of squares for Table 6.3, the various 
■deviations may be found explicitly, and squared and added. If the 
arithmetic is correct, the sums of squares between and within arrays 
should add up to give the total; this provides a check. Usually, 
however, this process is laborious and it is convenient to apply a 
modification of equation (1.2), p. 37. If the original units are used 
so that h~ i, the appropriate part of equation (1.2) may be written 

S{x - xf = ~ Nx^ = - . . (6.4) 

where jS, x and N have their usual meanings and T is the total 
value of the variate for the N observations, i.e. T — Sx = Nx. 
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Applying this to equation (6.2) and writing T = SS'x and 
Tg — S'x for the rth shrub,* we have ^ 

SS'{x - xf = SS’x^ - 
s ^ ^ s N 

ST ^ 

SS'{x~-x,f=:=^SS'x^-±^A . . . ( 6 . 5 ) 

and nS(x, — x)^ = — • 

^ ^ n N i 

The process of computation may be written in words: 

(1) Square the individual values and add, 

(2) Find the total value of the variate for the arrays, square, add, 

and divide the result by the number of individuals per array, 
and 

(3) Find the total value of the variate for all individuals, square, 

and divide by the grand total number of individuals. 

Then if (i), (2) and (3) represent the results of the above operations, 
equation (6,2) becomes 

[(i)-(3)] = [W-(2)] + [(2)-(3)]. 

When the data are grouped, readers should have no difficulty in 
applying equation (1.3), p. 38. In such instances, it is better not 
to use Sheppard’s corrections, but to keep the grouping fairly fine 
so that they are unimportant. The precise effect of these corrections 
is not certain; but they increase the apparent association by reducing 
the residual variance, so to neglect them in testing significances is 
to be on the safe side. 

It may be convenient to measure the values of the variate as 
deviations from some arbitrary origin and to divide them by an 
arbitrary constant h before summing and squaring. Then after apply- 
ing equations (6.5) it is only necessary to multiply the resultant 
sums of squares by The process of computation based on equa- 
tions (6.5) may easily be performed exactly, with the aid of a table 
of squares, down to the last stages of dividing by the numbers of 

* These values of T are not the total numbers of individuals but are the 
total amounts of variate in the several arrays. In Table 6.1, is the total 
number of ovules in the 100 ovaries from the ^th shrub. 



133 METHODS OF STATISTICS 

individuals, and since there are only two such divisions, these may 
be performed with ample accuracy without much labour. 

As an example, we have analysed the variance of the loo readings 
of Table 1.3, p. 32, into two portions, between groups of 5 and 
within a group. The results are in Table 6.4. Since the groups are 
random samples ^ = o, and the two estimates of variance should 
be nearly equal, as indeed they are ; actually the variance within a 
group is less than that between groups, and we will test the 
significance of this difference. In the notation of section 5.4, 

TABLE 6.4 

Analysis of Variance (Data Table 1.3) 


Source of Variation 

Sum of Squares 

'Degrees of 
Freedom 

Variance 

Between groups 

1 808-35 

19 

95' 18 

Within a group 

7056*40 

80 

88*20 

Total . . 

8 864-75 

99 

— 


z — ^ [Iog« 95-18 — log^ 88-2 o ] = 0-038 I, Ml =: 19 and = 80. 


From Fisher’s table we see that when Mj = 24 and — infinity, 
a 2: of 0-208 5 lies on the 0-05 point, so our z of 0-038 i is quite 
insignificant, and the two estimates of variance are equivalent. 

Tables with Non-uniform Array Totals 
6 . 4 . The analysis of variance may be performed on tables in which 
the numbers of individuals in the arrays are not equal. If the numbers 
in the arrays are Mi Mg . . . Kj . . . n^, equation (6.2) becomes 

SS'{x-xf=^SS'{x~x,f + Snlx,~xf . (6.6) 

and S(T^^ln^) is written for ST^^Jn in equations (6.5). 

In such instances, the relations of equations (6.3) do not hold, for 
in summing the squares of the deviations of the group means from 
the grand mean, each group has been given a different weight, n^. 
If the true variance between groups (ct^ is zero, however, — > v„ 
and the test for the existence of association is that and v, are 
significantly different. 




Length of Cuckoos' Eggs in mm. (central values) 
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TABLE 6.5 
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Table 6.5 gives distributions of the lengths of cuckoos’ eggs 
found in the nests of a variety of other birds (Latter, 1903). The total 
frequencies for the kinds of nest vary, and since the observations are 
fewer, the association is not quite so obvious as in the previous 
example. The grouping is rather fine for such a small sample, but 
has been adopted because Sheppard’s corrections will not be applied. 
The values have been measured as deviations from the length 
23-05 mm. in terms of the unit of grouping 0-2 mm. The quantities 
Tj and T, given in these units at the foot of the table, and 

we may calculate that 

y 2 y2 

SS'x^ = 3 934, 5 = I 577-79 and = 504-30. 

s * -t V 


TABLE 6.6 

Analysis of Variance (Lengths of Cuckoos’ Eggs) 


Source of Variation 

Sum of Squares 

Degrees of 
Freedom 

Variance 

Between nest types 

1 073-49 

5 

^ 14*7 

Within a nest type 

2 356-21 

1 14 

20-7 

Total 

3429-70 

119 

^ ’ — 


From these, the terms of equation (6.6) are found and inserted in 
Table 6.6. The variance between nest types is greater than the 
residual, and we will test it for significance. We find 

sr = I [log. 214-7 — log, 20-7] = I -17, %=5 and 112=1x4. 

For »! == 5 and = 60, z— 0-779 ^ o® the o- 1 per cent, 
point, so we must conclude that the association between egg-length 
and type of nest is real. These variances are in terms of the arbitrary 
units (o-a mm.), and if they are required in mm.^ they must be 
multiplied by 0*04. 

Relation between z and t Tests 

6 . 5 . A special case of analysis occurs when there are two groups; 
this is the comparison between two samples. If there are n[ and 
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observations in each group, x-^ and are the means, and x is the 
grand mean; the analysis of variance is as set out in Table 6.7. 

Si and are the summations over samples i and 2, and S is the 
summation over the whole ; also, if is greater than 

[log, — log, V,] = log, 

and the degrees of freedom are = i and 723 == ^[ + ^2 ~ 

The square root of the ratio of variances is the t of equation (5.3), 
p. 1 15, which was used in testing the difference between tv^o means. 
Thus, if we make the transformation 

log, OT t=^ e\ 


TABLE 6.7 

AjNfALYsis OF Variance (Two Groups) 


Source of Variation 

Sum of Squares 

Degrees of Freedom 

Variance 

Between samples 

\nj n^/ 

1 

Vs 

Within a sample 


ni-\- — z 

Vf 

Total .. ^ 

S{x—x^ 

^2' -f Wg — I 

— 



the distributions of t for n degrees of freedom, say, and of z^ for 
rii^ 1 and n are equivalent. This is because z and t are 
mathematically related ; for any value of t there is only one value of z. 

A slight modification in interpretation of the z test is necessary 
before we can show its equivalence to that based on For the simple 
use of z to test the difference between two variances, we showed in 
section 54 that the 0-05 level of significance is about the 2*5 per 
cent, point; on the other hand, when used as an alternative to the 
0*05 point of z is also the 0*05 level. When testing two variances, 
the hypothesis is that in the population 2 = 0, and a positive or 
negative deviation ^ if sufficiently large, may be incompatible with this, 
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showing that the variances are really different.* Now when t is zero, 
the corresponding z is minus infinity, and when t is plus infinity, 
^ is also plus infinity ; hence the whole distribution of a' is equivalent 
to the positive half only of the distribution of t. Applying the argu- 
ment of section 3,32, if t is always made positive, the 0-05 level of 
significance is the deviation of t at which an ordinate cuts off a tail 
0*05 of the area of the positive half of the curve of t; consequently, 
if we use the curve of z instead, the deviation beyond which the tail 
is 0*05 of the whole curve is the 0-05 level of significance (since the 
whole of the z curve is equivalent to the positive half of that for t). 
The difference between the two treatments of z lies in the type of 
question asked. In the former use we ask if z is significantly different 
from zero, and a large negative value may be so ; in the latter use 
for comparing two means we ask if z is significantly greater than 
minus infinity (i.e. if t is greater than zero), and no negative value 
can be so. A large negative value of z is quite compatible with the 
hypothesis that the means of the two samples are really equal, for it 
only shows that t is smaller than would be expected, and that the 
samples are a little too much alike. These varying distinctions 
between 0^05 points and levels of significance are very puzzling, but 
they should be clearly understood. 

We find from the tables that when = 24, f == 2-064 lies on the 
0-05 level, whence z = log^2'o64 = 0*725 for % == i and = 24 
should also lie there. Fisher’s tables of z give the value 0*724 6 as 
the 0*05 point. 

Significances of Groups of Means 

8.6, This process of analysis of variance is the better method of 
testing the differences between several means as a whole, referred 
to in section 5.5, and is eminently suited to small samples; for the 
distributions of z which are used are exact, provided the form of 
the residual deviations is normal and their variance is constant for 
all means. When Fisher’s tables of z are used to test the significance 
of differences between several means, the 0-05 point is also the 
same level of significance, as we have just shown for pairs of means. 

^ The fact that we interchange the degrees of freedom, choosing % to be 
associated with the larger variance and keeping z positive, is purely a matter 
of computing convenience, and avoids the necessity of having duplicate 
tables with negative values of z. The complete distribution of z contains 
negative deviations from zero- 
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Table 6.8 is taken from a paper by Warren (1909), and shows the 
mean head breadths of numbers of termites (small soldiers) taken 
from five nests during five months. 

We will first analyse the variance into two parts, bet^^een nests 
and within a nest. The actual deviations from the grand and nest 
means have been squared and summed and the sums are entered 
in Table '6.9. The between-nest variance is much greater than 
its residual, giving a ^ of 0-829, which lies beyond the o-oi point 

= 0-744 3 on it); so the variation from nest to nest is real. 
This merely confirms what would be expected from an examination 
of Table [6,8, for nests 670 and 675 have the highest means in ail 


TABLE 6.8 


Mean Head Breadths of Termites (Small Soldiers) in Mm, 


Nest Nmnber . . 

668 

670 

672 


675 

Means 

November . . 
January 

March 

May 

August 

1 

3-273 

2-332 

2-375 

2-373 

2-318 

2*479 

2*603 

2-613 

2*557 

2*377 

2*404 

2*457 

2-452 

2*396 

2*279 

2*447 

2*388 

2*515 

2*445 

2*312 

2-456 

2-626 

2*633 

2-487 

2*410 

2*411 8 

2 •481 2 
2*517 6 
2*451 6 

I 2*339 2 

“ 1 

Means . . j 

2*3342 

2-525 8 

2*3976 

2*421 4 

2*522 4 

2*4403 


months, 672 and 674 nearly always come next, and 668 has the 
lowest mean in all but the month of August. 

The reality of the differences between months is not quite so clear, 
for although August has the lowest mean in all nests but one, and 
March the highest in all but another, the other months show, no 
consistent differences. We can, however, test this in the same way 
by analysing the total variance into two parts, as in the second part 
of Table 6.9. The ‘‘between months” variance is certainly greater 
than the residual, but 


z = 



0*496, 


* This sort of table must not be confused with one like Table 6.1. Here 
the entries are measurements, there they are frequencies ; here the characters 
are three in number (head breadth, nest number, and month of year), there 
they are only two (ovules per ovary and shrub number). 
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and being barely on the 0*05 point = 0*526 5 is just on it) is of 
doubtful significance; hence on the simple analysis, the seasonal 
variation, though suggestive, is not well established. The situation 
is complicated, however, by the nest variation, and this treatment 
is not quite sufEcient; more complete methods will be dealt with in 
section 10.3. 

6 - 7 * We have shown that the method of the analysis of variance 
well expresses the heterogeneity of variability. This heterogeneity 
is obvious when the effect is great, but the analytical method is 


TABLE 6.9 

Analysis of Variance (Head Breadths of Termites) 


Source of Variation 

Sum of Squares 

Degrees of 
Freedom 

Variance 

Between nests . . 

0-137443 

4 

0-03436 

Within a nest . , 

0*130 967 

20 

o*oo6 55 

Between months 

0*094046 

■■ 

0*023 51 

Within a month 

0-174363 


- 0 * 008 72 

Total 

0*268 409 

24 

— 


objective, and so is particularly useful when the effect is smaller 
and not so obvious. Like all quantitative methods, statistical or 
otherwise, it merely specifies objective measures for qualities which 
may be subjectively appreciated in extreme cases. So far, however, 
we have not been concerned with more than testing for the existence 
of association, and have paid no attention to the problem of expressing 
its strength. As in all such tests, the probability of significance, 
depending as it does on the size of the sample, gives no indication 
of that strength. 

Although the methods and tests described in this chapter are only 
applicable when the distribution within each array is normal, work 
by E. S. Pearson and others, published in Biometrika, indicates that 
moderate skewness, such as is apparent in Table 6.r, is not likely 
to lead to serious error, 
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It is further assumed that the variance within the separate arrays 
is constant within the limits of the errors of random sampling. 
The residual variance is a mean variance for all arrays. Where non» 
uniformity is suspected, its significance may be tested by the method 
of section 5.5. 



CHAPTER VII 


CORRELATION 

COREELATION TABLES AND SCATTER DIAGRAMS 
7.1. When dividing the total variability of a quantity into parts, 
one of which is associated with arrays as in Table 6.i, it is by no 
means inevitable that the character which defines the arrays should 
be some qualitative description or serial number; it also may be 
the groups of some quantitative variate. Tables 7.1 to 7.5 are 
examples in which the individuals are classified according to two 
quantitative variates; these are called correlation tables. In the first 
of these, the arrays are groups of eggs having a small sub-range of 
length; each array is regrouped according to longitudinal girth. The 
association between the two quantities is well marked, and it is clear 
that there is also a tendency for the length to increase with the girth 
quite regularly. We have thus reached another stage in the treatment 
of variability; for while the single distribution shows the extent and 
form of the variation and the table of arrays introduced in the last 
chapter shows the association of parts of the variation with other 
factors, now the correlation table discovers the nature of that asso- 
ciation when the other factor is a quantitative variate. It will be 
noticed, however, that such a table may be approached in two ways; 
either variate may be regarded as the one which is being analysed, 
and both the columns and rows are arrays. 

When arranging correlation tables, it is convenient to choose the 
grouping so that there are from ten to twenty for each variate ; if any 
observation appears to fall exactly on the dividing line between two 
grpups, a half may be given to each, and similarly it may happen 
that a quarter of a unit may be assigned to each of four adjacent 
cells in the table. In specifying the characters, either the sub-ranges 
of the groups may be given, as in Tables 7,1 and 7.4, or the central 
values as in Table 7.5. The actual process of making a correlation 
table may be carried out in two ways. The first is to mark the 
position of each observation in the table by means of a dot or a 
stroke and finally to count the marks. This, however, is only suitable 
for fairly small samples, for if one makes a mistake or loses the 
place, there is nothing for it but to start again; further, the only 
means of checking the accuracy of the table is to repeat it. The 
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second method is to write the values of the characters of each indi« 
vidual on a separate card, and to sort the cards; the first sorting is 
into groups according to one character, and then each group is re- 
sorted according to the second character. It is then an easy matter to 
look through each pile of cards to see that none is out of place, and 
finally to count them and enter the numbers in a table. The writing 
of the cards takes some time, but it is a saving in the long run, 



ioo 150 m r 250 m 

mtn mm PER cm, mgm^m 

Fig. 8. — Scatter Diagram. 


particularly if there are more than two characters ; it is often possible 
to collect the data on cards and so avoid copying. 

If the observations are few, a correlation table is not suitable, and 
its place may be taken by a scatter diagram. This is just an ordinary 
graph in which and y are the two variates, and points on this 
represent observations; Fig. 8 is an example, and shows the 
relation between length and fineness of the hairs of a number of 
varieties of cotton.^ If the relationship between the two variates 
is exact, the points in the scatter diagram lie on a smooth curve, 
while as the relationship weakens, the diagram more and more 

* Data by Morton (1936). 
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TABLE 7*1 

Length and Longitudinal Girth of Eggs of the Common Tern 

r = +0*89 


{Data from Co-operative Study 19^3) 
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TABLE 7.1 — continued 

Length and Longitudinal Girth of Eggs of the Common Tern 

r ~ +0-89 


{Data from Co-operative Study 1933) 






Length in cm. 
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resembles the result that would be obtained by sprinkling the points 
evenly with a pepper-pot; the former type of diagram is usually 
the experience of the physicist. The scatter diagram and correlation 
table are equivalent, except that in the latter the area is divided 
into a number of rectangles, and the number of points in each one 
counted and substituted by the numerals. It is well to choose the 
scales of a diagram so that the range of variation is about the same 
in both directions, and the diagram is nearly square. 

The Correlation Coefficient 

7 . 2 . Correlation tables may show all degrees of association from 
the obvious one between the length and girth of eggs in Table 7.1 
to the inappreciable one between head length and reaction time to 
sight in Table 7.5. This quality is measured by the correlation 
coefficient (r), and the corresponding value is given with each of 
Tables 7.1 to 7.5; it is small when the association is weak, and 
large when the relationship is fairly close and strong. We shall now 
proceed to discuss the meaning of this coefficient and related con- 
stants, postponing a description of the methods of estimation and 
computation to a later part of the chapter. The correlation coefficient 
may be regarded from three points of view, discussed in the following 
sections headed Frequency Surfaces, Regression Lines and Analysis 
of Variance. 

Frequency Surfaces 

7 . 21 . Just as a single frequency distribution may be represented 
graphically by a histogram, so a correlation table may be represented 
by a similar figure in three dimensions. The base is divided into a 
number of rectangles representing the cells of the table, by a number 
of lines parallel to the x-- and jy-axes, and on these rectangles are 
raised columns, proportional in volume to the frequency in the 
corresponding cell of the table. Such is an empirical frequency 
surface, and it must be noted that frequencies are measured by 
volumes. 

Such a surface, of course, is made up of step-like figures (rather 
reminiscent of the basaltic columns of the Giant’s Causeway in 
Northern Ireland) and shows irregularities; but as the size of the 
sample is increased the irregularities disappear, and as the number 
of groups is increased the steps become smaller, until in the linadt a 
smooth surface (analogous to the frequency curve for a single variate) 



TABLE 7.^ 

Numbers of Pistils and Stamens in Ranunculus Ficaria 
Late Flowers, r = +0*75 
(Data by Weldon, 1901) 
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TABLE 7.4 

Daily Maximum and Minimum Temperatures at Rothamsted for August 1878-1926 
(Data by Fisher and Hoblyn, 1928.)* r = +0-30 
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results. The progress in devising systems of mathematical formulae 
to describe such surfaces is less than has been made for the single 
variate distributions, and methods based on the normal surface are 
applied to most distributions. The equation to the normal surface is 


df^~ 


N 


2 ( 


\a^ ^ ^ GxCfy) 


(i — r^) 


dxdy^ 


where x and y are the two variates measured as deviations from their 
means, <7^ and Oy are constants equal to the standard deviations of 
the two variates, f is a constant equal to the correlation coefficient, 
N is the total number in the sample, and dfis the element of frequency 
in the range dxdy with its centre at {x, y). If we cut this surface 
by any vertical plane parallel to either the x- or y-axis, the section is 
a normal frequency curve of standard deviation a /\/ 1 or 

sV I — Fig. 9 shows the contours of surfaces in which corre- 
lation coefficients are o, 0 • 3 , o • 6 and o • 9, and the standard deviations 
of X and y are both equal to unity; it will be seen that the surfaces 
all rise to a hump at the centre, but that they tend also to form a 
diagonal ridge which becomes narrower and sharper as r increases, 
as would be expected from the fact that the standard deviation of a 
sectional curve becomes smaller as r approaches unity. The correla- 
tion coefficient can never be greater than unity, and as it approaches 
that value, the ridge tends to become a thin outline in the form of a 
normal curve running diagonally ; when r = o, vertical planes through 
the centre parallel to the x- and y-dxes divide the surface into four 
equal quadrants. The correlation coefficient can be negative, but the 
only difference that makes is to cause the major axes of the ellipses 
to follow the other diagonal. 


Regression Lines 

7 . 22 . An easier way of treating a correlation table is to draw a curve 
relating the means of the arrays of x and y. This can be done in two 
ways ; either we may find the mean of y for every afray of x (i.e. 
referring to Table 7.1, find the mean girths when the lengths of the 
eggs are successively 3*SS“3*6o, 3*70-3*75, 3 * 7 S“ 3 '^^> 
or we may find the mean of x for every array of y (mean lengths 
when the girths are successively 9*80-9*90, 10*30-10*40, 10*40- 
10*50, etc., cm.). In Fig. 10 are given foe array means for each of 
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the five correlation tables; the means of y for arrays of x being 
represented by circles, and those of x for arrays of y by dots ; in the 
former x is called the independent and y the dependent variable, and 
in the latter, vice versa,"* The two ^ets of means are rather irregular, 
but it is presumed that this is due to the errors of random samp- 
ling, and that smooth curves drawn through the points represent 
more nearly the relationships that would be obtained if the size of 
the sample were increased indefinitely. The simplest curve that 
can be drawn, and one which is often a sufficient approximation, 
is the straight line, and appropriate ones are drawn on Fig. lo. 
For the maximum and minimum temperatures the circles giving 
the mean y for values of x lie above the straight line at both extremes, 


TABLE 7. 1 1 


Length 

Ntimberin Group 

Mean Girth 

3*575 

I 

9-85 

3-725 

2 

10*70 

3-775 

6 

10*67 

Means 3*742 

— 

10-59 


suggesting that the straight line is not quite adequate ; however, this 
is typical of the sort of data to which this method of correlation is 
often applied as an approximation, and we will use the straight lines. 
The two sets of means give two straight lines, which become more 
divergent as the association decreases; indeed, the diagram corre- 
sponding to Table 7.5 shows the two lines to be practically perpen- 
dicular. The line which is least inclined to the x-sxis has x as the 


^ The means of the extreme groups are naturally very inaccurate, being 
based on very few observations, and it is usual to combine several such 
^ groups. For example, in Table 7.1, if we assume the central values of 
lengths to be 3*575, 3*625, etc., cm., and those of girths to be 9*85, 9*95, 
etc., cm., we have the means in Table 7.1 1, with length as the independent 
variable. To combine the three readings, we find the weighted means of 
the length and girth. That of the length is 


3*575 4 - X 3*725 4 - 6 X 3*775 
9 


= 3 * 742 , 


and similarly that of the girth is 10*59. Points obtained in this way are 
shown by larger circles and dots in Fig. 10. 
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independent variable, and similarly the other, being more nearly 
parallel to the j-axis, has y as the independent variable. 

The general equation of a straight line with ^ as the independent 
variable is 

Y = ax + b 

where a and b are constants and the capital letter Y denotes the 
value of the dependent variable given by the line, as distinguished 
from the actual value of an individual, denoted by y. The problem 
of estimating the most appropriate values of a and 5 to fit a line to 
any given data is analogous to that of determining estimates of 
parameters for fitting frequency distributions, and the general method 
of solution that is most in accordance with statistical theory is the 
method of least squares. This consists in choosing the constants a 
and h so that the sum over all individuals of the quantities {y — Yf^ 
is reduced to a minimum. A similar line may be obtained giving X 
in terms of y. The equations to the two lines are deduced in sec- 
tion 7.6 and may be written 

{Y-f) = r'^{x-x) .... (7.1) 
{X - x) = r%> - y) .... (7.11) 

o-y 

where x and y are the two grand means, <7^ and Oy are the two 
standard deviations, and r is the correlation coefficient. The value 
of Y given by equation (7.1) is an estimate of the mean value of 
the array corresponding to a small sub-range about any given value 
of A?, and that of X given by equation (7.1 1) is similarly the mean 
of the array corresponding to a given value of y. Equation (7.1) 
represents the line that is more nearly parallel to the ^-axis; bo’th 
lines pass through the point (x^y). These are called regression lines 
and the constants 

1 ^x 

r— and r— 

o-y 

* In a diagram, these are the distances of the points from the line measured 
in a direction parallel to the y-axis. They are not perpendicular distances 
from the line. Further, these points represent individuals, and not array 
means. If array means are used, a weighted sum of (y^ — Yf is reduced to 
a minimiim and a similar result is obtained. 
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are called regression coefficients. The coefEcients are the tangents of 
the angles the two lines make with the and y-axes respectively. 

If r == o, equation (7.1) gives (F~y) = o, an equation which 
is satisfied by a line parallel to the ^-axis, and going through the 
mean, y; similarly equation (7.1 1) gives (X — x) = o, and this is 
satisfied by a line through the mean, x, and parallel to the y-axis ; 
the two lines are perpendicular. The meaning of these two lines is 
that the mean value of y is the same for all values of x^ and that 
the mean of x does not change with y, or in other words, that there 
is no association between x and y; and thus we see that r is zero 
for zero association or independence. If r = i, one regression 
coefficient is equal to the inverse of the other, and the two lines 
coincide. This is the condition for perfect association, i.e. when x is 
uniquely determined by y (and vice tersd) and is the state of affairs 
at which the physicist aims when he controls his experiment so that 
only the two factors vary and there are no extraneous disturbing 
influences. The biologist usually has to deal with materials subject 
to variations over which his control is limited, and consequently he 
obtains the kind of result illustrated in Tables 7.1 to 7.5, where 
varying strengths of relationship are shown. If these tables are 
studied with Fig. 10, it will be noticed that the stronger the relation- 
ship, the greater is r and the closer together are the two regression 
lines. If <7^ = o*y, or if the diagrams are plotted on such a scale that 
units of 

and 

OTy 

are equal, the tangent of the angle between each regression line 
and the axis of the corresponding independent variable becomes 
equal to r. If r is positive, the slopes of the lines are in the direc- 
tion indicating that x increases with y, while if r is negative the 
slopes are in the opposite direction, showing that x increases as y 
decreases. 

Analysis of Variance 

7 . 23 . The preceding section, dealing with regression, pays attention 
to the relationship between x and y, regarding the deviations rather 
as errors due to uncontrollable factors; we will now consider the 
correlation table from the point of view of the variability. Adopting 
the method of the last chapter, we can analyse the variance of say 
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egg girth (Table 7.1) into two portions, one associated with differ- 
ences between the array means for the various groups of length, 
and the other with residual deviations within the arrays. This can 
be done by finding the individual array means, as was done in the 
last chapter, and calculating their variance and the variance of the 
residual deviations from them. For Table 7.1 we should combine the 
first five and the last two, giving 18 arrays, or 17 degrees of freedom 
on which to estimate the variance between arrays, and 937 on which 
to estimate the residual. If, however, we are satisfied that a straight 
regression line like equation (7.1) sufficiently represents the trend, 
the value Y of the girth given by this line for various central values 
of the length (») may be used instead of the array means y^. For 
Table 7.1 the line of regression of jy on a; (as computed in section 
7 - 5 ) is 

(Y- 11-378)= 1-756 (*-4-190); 

when » = 3*575. F— 11-378 = — i-o8o and F= 10-298, 

when *==3*725, F— 11-378 = — 0-817 F= 10-561, 

and so on. For the sum of squares of the deviations of the array 
means from the grand mean (11-378) we take i x (— 1-080)® 
-f- 2 X ( — 0-817)®, etc., adding these quantities for all arrays; for 
the squares of the residual deviations from the array means we take 
(9-85 — 10-298)® -t- (10-35 — 10-561)® + (11-05 — 10-561)®, etc., 
adding these quantities for all cells in the table, while for the total 
sum of squares we use the marginal totals as usual. These three 
sums may be found and entered in a table of analysis of variance 
in just the same way as if we had used the actual array means. 
T his process can be carried out explicitly as indicated, and the 
reader is recommended to do it as an exercise; alternatively, all the 
various sums of squares may be foimd from some of the constants 
of the table as shown below. We will set out the process algebraically. 

Repeating equation (6.6), section 6.4, except that the variable is 
changed to y, we have for the first method of analysis, 

SS'(y-yf^SS'(y-y,f + Sn 0 ,-yf, . . (7.2) 

and for the second method, using the F given by equation (7.1) 
instead of y^, we have 

5(y-^)® = 5 (y;- F)® + S(F-j)®, . . . (7-3) 
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where 5 = SS' is the summation over all individuals. This may 

S 

be expressed in words — the sum of squares of deviations from the 
grand mean equals the sum of squares of deviations from the regres- 
sion line plus the sum (over all observations) of squares of deviations 


TABLE 7.6 

Analysis of Variance of y (Egg Girth) 


Source of Variation 

Sum of Squares 

i Degrees of 
j Freedom 

Variance 

Straight regression line 

5(F-35)=‘ 

= r^S(y - yy 

I 

y^s 

Residual 

S(y - Y)^ 

= (i - r^)S(y - yy 

N-z 

yVr 

« 

Total 

S(y - yy 

N- 1 

yVT 


of the regression values from the grand mean. The terms of equation 
(7.3) are entered in Table 7.6, and it is shown in section 7,6 that 
the sums of squares are severally equal to those terms in the table 
which involve The constant r is again the correlation coefficient. 


TABLE 7.6x 

Analysis of Variance of x (Egg Length) 


Source of Variation 

Sum of Squares 

Degrees of 
Freedom 

Variance 

Straight regression line 

S^X - xf 

= r^S{x - xf 

I 

x'Vs 

Residual . • . - ! 

» { 

Six - xy 

= (i - r^Six - xy 

N-z 

' xVf 

Total 

Six - 

N- I : 

xVt 


If the reader has found the sums of squares directly as suggested in 
the last paragraph, he will also be able to check these relations. 
Similarly, by using the regression line of equation (7.1 1) we may 
analyse the variance of x and arrive at the sums of squares in 
Table 7.61. 
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We shall now discuss the degrees of freedom. We have seen that 
in an infinite population for which a; and y are independent, the 
regression of jy on a; is a line through the mean, y, and parallel to 
the ^-axis; it follows that for such a case {y — Y) equals (y — y) 
and the residual sum of squares equals the total. For a finite sample, 
however, owing to random errors a regression line will have a slight 
slope and the residual sum of squares will be different from the 
total. Moreover, since the regression line is obtained by the method 
of least squares the residual must always be less than the total. It 
follows from a proof given by Fisher (1925^) that if the total sum 
of squares has n degrees of freedom, the amount of this difference 
is, on the average, one /zth of the total sum of squares, and that the 
residual sum of squares divided by w — i is an unbiassed estimate 
of the variance of y in the population, having the sampling distribu- 
tion of an estimate based on i degrees of freedom.* Applying 
this result to Table 7.6 and remembering that there are iV — i 
degrees for the total sum of squares, we see that when the asso- 
ciation in the population is zero, the three variances in the last 
column are estimates of the same variance, that ofy in the population. 

Thus, for zero association in the population. Table 7.6 is analogous 
to Table 6.3 with two arrays (one degree of freedom). If, now, there 
is an association between x and y , the difference between the residual 
and total sum of squares is greater than the proportion due to one 
degree, is less than and yV^ is greater than either. We may 
carry the analogy between Tables 6.3 and 7.6 further by regarding 
yVs as the variance due to the regression line; we shall make no 
attempt, however, to find an analogue to erf in Table 6.3. The 
value yz;^ is an estimate of the true variance within arrays, and also 
of the variance of the distribution represented by a cross-section of 

* This is a special case of a more general result that if to a series of JV* 
independent observations an equation or series of equations involving u 
constants altogether is fitted by the method of least squares, the sum of 
squares of the deviations from the values given by the equations, i.e. the 
residual sum of squares, divided by iV — ■ m is an unbiassed estimate of the 
variance in the population. Thus, we may say that each constant so deter- 
mined from the data absorbs one degree of freedom. The one degree of 
freedom associated with the linear regression line is due to the slope, the 
other constant, y, is eliminated from both the residual and total sum of 
squares. This result also follows from a proof given by Irwin (1931). 

This property is a powerful recommendation for the use of the method 
of least squares in statistical work. 
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the frequency surface in a vertical plane parallel to the j-axis. Similar 
arguments may be applied to the analysis in Table 7.61, 

It is thus seen that the two straight lines found by least squares 
are those which make the residual variances a minimum (or make the 
variance associated with themselves a maximum), and r is a measure 
of the amount of the total variation in each quantity associated with 
the appropriate line. If r == i, the residual variance is zero and all 
the variation in one quantity is explained by variation in the other, 
while if r = o, the residual variance is as great as the total, and there 
is no association between ^ and y. This point of view (since it uses 
r^) takes no account of the sign of r, but is merely concerned with 
measuring the strength of association without troubling whether y 
increases with x or whether it increases as x decreases. 

General Discussion 

7 . 3 * The correlation coefficient is much used as a measure of the 
strength of association, but it is not easy for the beginner to appreciate 
its scale of values. It may vary between plus and minus unity, but 
the sign merely shows the direction of the trend (whether x increases 
withy or whether one increases as the other decreases), and so as a 
measure of association we are only concerned with values between 
o and I ; a correlation of — 0*5 is as good as one of + 0*5. If we 
consider samples in which N is so large that for all practical purposes 
AT — I = AT — 2, we see from Tables 7.6 and 7.61 that the 

ratio of the residual to the total variance, is practically equal to 
(i — r^), and that the ratio of the corresponding standard deviations 
is Vi — These quantities are given in Table 7.7, and the meaning 
of any value of the correlation coefficient between two quantities is 
that if we keep one quantity constant, the variability of the other 
is reduced in the ratio given in the third column of the table. The 
scale is very uneven, since a correlation of o*6 reduces the residual 
standard deviation to 0*8 of the total, while even if the correlation 
is as high as 0-9, the residual deviations have a standard deviation 
of 0*436 of the total. This uneveimess of scale is shown by Tables 
7.1 to 7.5 ; a far greater change in association is noticeable between 
Tables 7.1 and 7.3 for a smaller change in correlation coefficient 
than between Tables 7.3 and 7.5; i.e. a coefficient of i*o is much 
more than twice as good as one of 0*5. Attempts are sometimes 
made to explain correlation in terms of chances, and to state that 
if the coefficient for two characters is o*6 (say), and knowing one 
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of them we use the regression formula to guess at the other, we shall 
be right six times in ten trials. Such an explanation is nonsense, 
and the truth is that the standard error of our guess will be reduced 
to I — o * 6^ = o • 8 of what it would have been had we not used 
the formula. Such a reduction in standard deviation may be further 
interpreted in terms of reduction in frequencies of estimates deviating 
from the actual values by given amounts, on referring to section 3.52. 

The following typical cases may assist the reader. Yule (1927) 
gives the correlation coefficient between the ages of husbands and 
wives in England and Wales as 0-91, and that between the age and 
the standard of elementary school boys is of the same order (Jones, 


TABLE 7.7^ 


Correlation 

Coefficient 

K 

Ratio of Variances 
Residual/Total 

Ratio of Standard 
Deviations 

ajcj. 

z' 

zero 

I *00 

I ’OOO 

zero 

0*2 

0*96 

0*980 

0*20 

0-4 

0*84 

0*917 

0*42 

0-6 

0*64 

o*8oo 

0*69 

0-7 

0-51 

0*714 

0*87 

0*8 

0*36 

o*6oo 

1*10 

0*9 

0*19 

0-436 

1*47 

I -o 

zero 

zero 

infinity 


* The column under z' will be referred to in the next chapter. 


1910), The likeness between parents and children is expressed by a 
coefficient of about 0-47 as far as height and several other physical 
characters are concerned (Snow, 1911), and that between cousins or 
uncles and nephews, which is so small as scarcely to be noticeaWe 
to the ‘‘man in the street,’’ has a correlation coefficient of about 0*26. 
Similarly, taking periods of six days as xmits, the correlation between 
the rainfall and hours of bright sunshine in Hertfordshire is negative 
(i.e. increased rainfall is associated with decreased sunshine) and 
about 0*2. 

The uses (and abuses) of the coefficient of correlation are many, 
but it may be always safely regarded as a constant descriptive of the 
properties of the sample. For instance, Weldon found that the corre- 
lation between number of pistils and stamens of late flowers of 
Table 7.2 was 0*75, while for the early ones, Table 7.3, it was 0*51 ; 

L 
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that change in the strength of relationship is as much a characteristic 
of the flowers as a change (say) in mean height of the plants, Fisher 
and Hoblyn (1928) giye tables to show that the correlation between 
maximum and minimum temperature is about 0*71 in January and 
0*30 in August; this change is gradual from month to month and is 
a quality of the climate. 

If a linear regression formula is being used to predict one 
character from another, the correlation coefiicient may safely be used 
to define the accuracy of the prediction as has been shown, the 
standard deviation of the differences between actual and predicted 
values being y x — times the standard deviation of the dependent 
variable. 

A rather more dangerous use of this constant, however, is as evi- 
dence of causation. If two quantities are associated (as shown by the 
correlation coefficient) the inference often made is that one is a 
cause and the other an effect. Such an inference is often erroneous 
when dealing with quantities susceptible to sqch close control that 
the correlation coefficient is unity, but it is particularly unsafe when 
there are uncontrolled variations and the relationship is not exact. 
Often two quantities are both affected in the same way by a third' 
so that they appear to be related, when actually neither if altered 
independently would have any effect on the other. For instance, 
Yule refers to the fact that the proportion of marriages con- 
tracted outside the Church of England has for many years been 
increasing, while the average age at death has also been increasing, 
and there is a positive correlation; but no one supposes that there is 
a causal relationship and that a law prohibiting the solemnisation of 
marriages in churches would have the effect of improving the lon- 
gevity of the nation. When investigating causation, it is usually well 
to r decide first on other grounds that a causal relationship between 
two factors is likely, and then to conduct a close analysis of other 
possible factors before using the correlation coefficient as evidence. 
Care, common sense, imagination, and a technical knowledge of the 
subject to which it is applied are particularly necessary in this use 
of correlation. We shall in Chapter XI discuss methods of analysing 
data for, and eliminating other factors. 

When considered as* the expression of the relationship between 
two quantities (the second method of approach above), the correla- 
tion coefficient merely measures the importance of the variations 
miantitv associated with the other (the independent variable) 
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relative to all other variations^ and if the variance of the independent 
variable can be controlled, either experimentally or by selection, 
the correlation can be altered. It is thus not a physical quantity in 
the way the regression is. To demonstrate this, we will suppose the 
independent variable to be oc, the dependent one y, the regression a, 
the correlation coelEcient r, and the standard deviations o*^ and Gy. 
Then if is the standard deviation of the residual deviations of y 
from the regression line, which we may suppose to be constant for 
all values of 

= Gy's/ 1 — r\ 

But 



whence 


G^'^+ U^g/ 


Now G^ and a are assumed to be constant, so that r is not inde- 
pendent of G^. For the relationship between numbers of pistils x and 
stamens 3^, r = 0-749; = 3*388, Oy == 3*298, g ^ = 2-185, 

a = 0*739 and au^ = 3*470. If now we had selected flowers so that 
the standard deviation of the number of pistils had been reduced by 
one-half, we should have reduced the correlation coefficient from 
0*749 0-493, and we should have increased it to 0-914 if we had 

been able to increase the variation of pistils so as to double the stan- 
dard deviation (these results may be obtained by substitution in the 
above formula). The same consideration applies when the effect of 
some factor like temperature on some variable quantity is being 
investigated experimentally; by altering the range of temperatures, 
the correlation coefficient may be altered considerably, and so it is 
a result of the arbitrary experimental conditions. When such con- 
ditions are under the control of the experimenter, the amount of 
variation in x should be made as great as possible so as to make r 
large; we shall see in the next chapter that this gives a maximum 
precision to the measurement of a. A low correlation coefficient 
does not necessarily mean that x is incapable of having an important 
influence on y ; it may mean that an insufficiently large range of x 
has been tried. Or if x cannot be varied, it may be that although 



METHODS OF STATISTICS 


164 

its effect on y as measured by the correlation coefficient is small 
relative to disturbing factors, its absolute effect as measured by the 
regression coefficient is important. The correlation coefficient between 
the yield of some crop over several plots and quantities of fertiliser 
used may be low, but if the regression is appreciable, the total 
advantage of the use of fertiliser by the whole agricultural industry 
will be considerable. In such instances, except as an indication of 
the existence of association, the coefficient of correlation is an un- 
suitable constant, and that of regression is more suitable. 

Those whose previous outlook has been physical often prefer to 
regard the scatter of points on a scatter diagram as being due to dis- 
turbing factors akin to experimental errors, and as obscuring an exact 
physical relationship which is assumed to underlie the data. To such 
people the regression obtained by the method of least squares 
appears to represent the best possible approach to the true relation- 
ships, and when they realise that there are two regression lines, they 
are puzzled to know which to take. Such an interpretation of a scatter 
diagram or correlation table may be a little dubious, and in any case, 
without additional knowledge, it is not possible to arrive at a theo- 
retically satisfactory unique line. It is safer to regard all the features 
of the table — ^including the scatter — as equally real properties of the 
sample, and to regard the means, variances, correlations and regres- 
sions as so many constants descriptive of some of those properties. 

To sum up; there are three important constants that express the 
properties of a correlation table : 

(1) The correlation coefficient that measures the importance of 

the variation in y associated with x relative to the total 
variation, 

(2) The regression coefficient that measures the average amount 

" of increase or decrease in y per unit increase in x, and 

(3) The residual variance that measures the scatter of values of y 

about the regression line. 

For different populations, these constants are independent in that 
a high value of one constant does not necessarily mean a high or 
low value of either of the others. 

Estimation of Coefficients 

7 A The metiiod of maximum likelihood may be applied to deter- 
mining estimates of the parameters of the normal frequency surface 
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of section 7.21. For x, and a-y it gives the estimates already 
deduced for single distributions and for r it gives 



V S{x — xyS{y — yY 

where S is the summation over all individuals. 

To be consistent with the earlier treatment of frequency distribu- 
tions we should denote the population value by p and describe the 
above value of r as an estimate obtained from a sample. 

From equations (7.1), (7.11) and (74) we have for the regression 
coefficients 


D « r 0*3, — y) 

Regression of y on x = 

o-^ S{x — xf 

and 

_ . Gx S(x — x)(y — y) 

Regression of x on y =^r— ^ - - — — — irr 

S{y-yf 

If N is the size of the sample, 

S{x-3c){y-y) 

N 


(7-5) 

(7-51) 


is called the first product moment (= p, say), whence we see that 


N p 
{N — i) a^y 


■ ■ (7-41) 


where and ay are calculated correctly on iV — i degrees of free- 
dom; if the sample is large NfiN — i) may be taken as nearly equal 
to unity. 

For moderately small samples, r may be calculated directly from 
the individual observations, using equation (7.5), but when the sample 
is large and the data are grouped, a rather more elaborate method of 
computation, such as is illustrated in the next section, is advisable. 


Computation of Coefficients 

7 . 5 * The denominators of the above equations may be computed 
by the methods of section 1.31 or 6.3, and here we need only concern 
ourselves with the product term in the numerators. It will be con- 
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venient to use similar methods. Corresponding to equation (1.2), 
p. 37, we have 

( 7 - 6 ) 

and corresponding to equation (6.5), p. 131, we have 

S{x-x)(y-y)^Sxy~Nxy = Sxy-^ . . (7.61) 

where x and y may be measured from any convenient origin in the 
true units. The correcting term in the above equations may be 
positive or negative, and if either x or y is zero, this term is zero. 

When the observations are grouped, all those in any group may 
be assumed to be at its centre and it is convenient to measure the 
deviations from a group near the middle of each distribution in 
terms of the sub-range or hy. Then if p' is the product moment 
in such units, 

p = KKp' ( 7 - 62 ) 

Sheppard’s corrections may be applied in calculating the denomi- 

nators of equations (74) and (7.5) from grouped data; there is no 
corresponding correction for the numerator. 

We shall compute the constants for Table 7.1 as an example, 
where the arbitrary values x' and y' are given in the column or 
row adjacent to that containing the true values of the variate. The 
computations are all contained in Table 7.8, where columns (i),(3), 
(3)> (4)> (7)> i^)y (9) contain all the data for obtaining the 

means and the second moments, and below the table all the calcu- 
lations have been carried out as in the first chapter. Sheppard’s 
corjrections have been applied, and as the sample is a large one, the 
second moment has been obtained by dividing the sums of squares 
by the number of observations (995) instead of by the degrees of 
freedom. 

To find the product moment, we must first find Sx'y\ This could 
be done directly by writing in the comer of each cell in Table 7.1 
its x'y (the one in the top left-hand corner would be — 14 x ii 
= + 154, the next in the top row would be — 14 x — 10 = -j- 140, 
and so on), and then multiplying each x'y' by the number in the cell 
and adding; in this table, all the squares in the two quadrants 
containing: most observations would have positive products, and those 
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in the other two would have negative ones. A better method, how- 
ever, is to do the summation in two parts, keeping y' constant first, 
summing all the x''& in each array of y' separately, and then adding 
the arrays. If we consider any array of y' with Uy observations: 

Sx'y* = Sy'S'Xy = Sy^n^’^, . . , (7,63) 

where 5 , S and 5 ' are the summations used previously, and Xy 
is the mean of all the observations in the sth array of y'. 

The process is carried out in columns (5) and (6) of Table 7.8. 
In the array y = — 14, Xy is — ii/i, so UyX' = — ii ; y = — 9, 
= — 12/2, SO fiyXy = — 12, and so on for the other quantities in 
column (5) ; the sum of this column is the sum of all the deviations of x' 
from the arbitrary origin, and so should equal the sum of column (9). 
The terms of column (6) are the products of those in columns (i) 
and (5), and their sum is the required Sx'y', As a check on the 
arithmetic, this may be done the other way, summing first for each 
separate array of x^ and adding them. This is done in columns (ii) 
and (12), and if the arithmetic is correct, the totals of columns 
(ii) and (3) and those of columns (6) and (12) should be equal. 
There is no independent check of columns (4) and (10), and so they 
should be repeated carefully. 

The product moment is calculated below Table 7.8 by applying 
equation (7.6) to the arbitrary values and then correcting by equa- 
tion (7.62). In finding r and the regressions, we have used p, 
and (jy in the natural units of centimetres, but it would be just as 
easy to maintain arbitrary units throughout and then to correct the 
regressions by multiplying by hylh^ and hjhy. The correlation 
coefficient, being a pure number, needs no such correction. 

Regression Lines and Sums of Squares: Proofs 
7 . 6 . Let A? and y be the individual values of the variate with means 
X and y; then it is required to find the constants a and b of the 
following equation : 

Y = ax + b, (7.7) 

such that S(y — Y)^ is a minimum". 

From (7.7) 


S(y ^ y )2 S(y- ax b)\ 
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(1) 

(2) 

; (3) 

(4) 

(5) 

(6) 

y' 

fly 

fiyy ' 


nyXy 

nyXyf 

-14 

- 13 

- 12 

I 

- 14 

196 

“ II 

154 

- 1 1 

- 10 

~ 9 

2 

- 18 

162 

— 12 

108 

- 8 

2 

- 16 

128 

— 12 

96 

- 7 

6 

1 - 42 

294 

- 32 

224 

— 6 

15 

1 - 90 

540 

“ 75 

450 

- 5 

xo 

: - 50 

250 

43 

215 

~ 4 

35 

— 140 

560 

-117 

468 

- 3 

38 

-114 

342 

— no 

330 

— 2 

79 

i -158 

316 

-141 

282 

— I 

90 

- 90 

90 

- 65 

65 

0 

119 

— 

— 

+ 9 


I 

1 5 14 

+ 114 

114 

141 

141 

2 

lOI 

202 

404 

185 

370 

3 

98 

294 1 

882 

284 

852 

4 

76 

304 

1 216 

285 

I 140 

5 

66 

330 

1 650 

283 

I 415 

6 

48 

288 ' 

1 728 

26s 

I 590 

7 

29 

203 

1 421 

186 

I 302 

8 

16 

128 

1 024 

122 

976 

9 

6 

54 

486 

60 

540 

10 

3 

30 

300 

28 

280 

\ 

i 

1 

1 1 

121 

1 1 

121 

1 

Totals . . i 

955 

+ 1 226 

12 224 

i 

+ 1 241 

+ II 119 


y = 


+ I 226 

955 


= + 1 283 77, 


yv 


P ' 


mean girth = 11*25 -h 0*128 ^ ii *378 cm 
, 12 224 

= — 12 -goo o 

955 

” y^i'^ = — I *648 I 
= 11-1519 

— 0-083 3 

yJJ-i ~ X I *068 6 

Gy'^ ^ hy X II -068 6 = O * 1 10 69 cm.= 
Gy ~ 0-332 70 cm. 

-fii 11 9 
955 

- == ~ I -668 2 


=11 -642 9 


= +9*974 7 

P — hxhy X 9-974 7 cm.® = + 0-049 874 cm.® 
0*049 874 _ 

0 - 33270 x 0-16854 ^ 


r = + 
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mean length = 4- «5 + = 4 - 190 cm, 


^^ 3 , — 13-1340 

955 


- 


xP'Z 


^ 1-688 6 

11*445 4 
— 0*083 3 
1 1 * 362 I 


^,2 = hx ^ X 11-362 I 

= o-i68 54 cm. 


=*0-028 405 cm.* 


Regressions:- ^.^^^874 
yon* = + 0-028405 


4- 1*756 cm- /cm. 


. 0*049.874 „ j. 6 cm./cm. 

+o“^7i^ 
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and differentiating this with respect to a and b respectively, and 
equating to zero (the usual method of evaluating constants to reduce 
a function of them to a minimum), we have 


— 2Sx(y — ax ~ b) = 0 , 
— 2S(y — ax — b) — o; 


or expanding term by term, and cancelling out common terms, 


Sxy — aSx^ + ^Sx, 1 
Sy — aSx + bSi . J 


• ( 7 - 8 ) 


Now Sx == Nx, Sy ^ Ny and Si = JV, and using these we may 
solve (7.8) to obtain 


and 


Sxy — Nxy 
^ "" Sx^ - Nx^ 

b ^ y — OX, 


. . (7.81) 


We have seen that Sxy — Nxy = S{x — ^{y — y) and Sx^ — N^ 
=:S(x — x)^; using these in (7.81) and substituting in (7.7) we 
obtain 




(7-71) 


which reduces to regression equation (7.1), when the value for r 
given in equation (74) and the standard deviations are substituted 
for the sums of products and squares. 

Rewriting equation (7.3) by substituting the regression value of 
equation (7.1) for Y, we have ’ 

S(y _ j;)a = 5 I (3, _ j) _ — x) 1 + r^\s(x — xf. (7.9) 

I 

Now 

a/_ S(y-yf 
S{x — xf 

and the second term on the right-hand side of equation (7.9) becomes 


r^S{y—y)K 
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Also expanding the first term on that side and substituting for r 
from equation (7.4) we find that the term reduces to 

Equation (7.9) may thus be written 

S[y - yf = (i - /•2)5(y - ff + r'^Siy - yf, . (7.91) 

and these are the sums of squares entered in Table 7.6. 

Further, by substituting for r in equation (7.91) we see that the 
s um of squares of the deviations of y from the regression line is 

S{y-Y)^ = S{y- yf - s{x-xY^ ' 

We shall use this in subsequent chapters. 

By a similar series of calculations, interchanging x and y, we can 
obtain the regression line and sums of squares for a?. 



CHAPTER VIII 


SAMPLING ERRORS OF CORRELATION AND 
REGRESSION CONSTANTS 

Like all other statistical constants, the correlation coefficient is 
subject to errors of random sampling, so that different samples 
from the same population give varying values which themselves 
form a frequency distribution. In this chapter we shall deal with 
this distribution and with tests of significance based upon it, and 
with the sampling errors of regression constants. 

Populations with Zero Association 

8 . 1 . One of the commonest tests is to see if an observed correlation 
coefficient is significantly greater than zero, and to do this, the prob- 
ability that such a value could arise in a random sample from a 
population having no association has to be found. For samples 
from such a population, the standard error of the distribution of r is 
± i/VA'— I (or for large numbers, ± ijVN is an adequate 
approximation) where N is the number of individuals, i.e. the 
number of pairs of readings of x and y. Unless the sample is very 
small, the distribution is near enough to normality to justify the use 
of the criterion that a value of r more than twice the standard error 
for zero association is above the 0-05 level of significance. For the 
data of Table 7.5 there are 4 690 observations, and the correlation 
between reaction time to sight and head length is — 0^034, with a 
standard error of i 0*015* The value is a little more than twice its 
standard error, and so is suggestive of a real, although exceedingly 
weak, association. It would not be safe, however, to deduce very 
much from such a result; indeed, a correlation of that sort might 
conceivably arise if there were a small personal error in measurement 
such that the observer who overestimated the head-length tended 
slightly to underestimate the reaction time, and if several observers 
took the measurements, the same one always measuring both char- 
acters on the same individual. For the maximum and minimum 
temperatures of Table 7.4, N == i 518, r == + o - 30, and being more 
than ten times its standard error 
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is undoubtedly real. Hence, sufficient observations can establish the 
reality of an exceedingly weak association, and the probability of 
its significance is no guide to the closeness of the relationship. 

We have seen, however, from Tables 7.6 and 7.61 that correlation 
may be expressed as an analysis of variance, where one degree of 
freedom is taken by the regression line and (N — 2) are taken by 
the residuals. For a sample from a population with zero correlation 
the two variances yV^ and yV^, are independent estimates of the same 
variance, and we may test for the significance of their difference by 
finding 


« = i log, 


= i log. 


r%N-2) 

I _r2 


. (8.1) 


and seeing if it lies beyond the 5 or i per cent, point of Fisher’s tables 
for Ml = I and n^ — N — 2. Alternatively, according to section 6.5 
we may use the f-test, finding 


t 



r\N-z) 

I — r® 


. (8.2) 


and testing its significance for iV — 2 degrees of freedom. 

Fisher (1936) has prepared tables giving values of the correlation 
coefficient on different levels of significance for samples of various 
sizes. When there are 12 degrees of freedom {N — 14), r = 0-532 4 
lies on the 0-05 level, and making the above transformation, we 
find that .a = 0-778 8 and f = 2-179. Both of these lie on the 5 per 
cent, level of their respective distributions when — X and K2 = 
or M = 12. We can now check the adequacy of the assumption of 
the normal distribution of r for moderately large samples; when 
N = 20 (say), i/VN — i = 0-229, ^ correlation coefficient of 

0-46 lies on the 0-05 level, while Fisher’s tables give the exact 
value for 18 (= iV — 2) degrees of freedom as 0-444; when JV = 10 
the corresponding values are 2 X 0-333 = 0-67 for the normal 
distribution and 0-632 for the true distribution, and at iV = 5 they 
are 2 X 0-500 =1-0 and 0-878. As far as the 0-05 level of signifi- 
cance goes, the assumption of normality is not likely to lead to 
serious error even for samples as small as 10, but below that it 
makes the test unnecessarily stringent; the true test based on z or t, 
however, is correct for samples of all sizes. 

Contrary to an opinion often expressed, an association measured by 
the correlation coefficient may be significant in a very small sample 
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if it is strong enough ; and this is in accordance with common experi- 
ence. A physicist, for instance, if he obtains on a graph half a dozen 

TABLE 8.1 

Percentage Protein Content 


Corn Selected for 


Year 

( 

High Protein Content 

Low Protein Content 


Variability 

1/^ 

Variability 

1896 

20*93 

9*50 

20*93 

9*50 

1897 

20*99 

20*90 

10-63 

8-47 

1898 

20*98 

22*25 

20*49 

X2*6i 

1899 

21*62 

22*00 

9*59 

20*50 

1900 

22*62 ^ 

8*09 

9*13 

11-34 

1901 

13-78 

8*48 

9-63 

11-47 

2902 

22*90 

8-50 

7*86 

9* 60 

1903 

X 3 -SI 

20*04 

8*00 

20*42 

1904 

mos 

90s 

8*27 

9*91 

2905 

14-73 

8-55 

8-58 

9*92 

2906 

24*26 

9*29 

8-65 

10*64 

1907 

13*90 

20*72 

7 * 3 a 

13-57 

2908 

13*94 

22*92 

8*96 

14*06 

2909 

23‘29 

20*76 

7-48 

22*57 

1910 

14-87 

9*68 

8*26 

10*41 

I9II 


22*98 

7*90 

24*81 

I9IZ 

24*49 

7*80 

8*23 

9*96 

1913 

14-83 

8*23 

7*71 

22*32 

1914 

25*04 

9*44 

7*67 

22*39 

ms 

14*54 

20*19 

7*27 

11*69 

1916 


8*56 

8*68 

22*86 

1917 

14*45 

22*80 

7*09 

lO'OI 

1918 

15*49 

[ 8*78 

7*13 

10*52 

1919 

24*70 

20*54 

6*46 

8*05 

1920 

24*01 

22*78 

7*54 

11*80 

2921 

x6*66 

21*04 

9*24 

14*77 

1922 

17*34 

7*15 

7*42 

9*43 

1923 

i 6 -S 3 

8*50 

6*48 

21*27 

1924 

i6*6o 

7*27 

8*38 

23 * 9 ^ 


points which lie anywhere near a straight line does not say that there 
is no relationship ljut assumes one and draws his line. This is the 
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more extreme case in which the association is high, and the correlation 
coeiEcient is a device for measuring and testing the association more 
objectively and exactly, and is particularly useful in border-line cases. 

We shall take as example some data by Winter (1929), for which 
corn was grown for twenty-nine years in two series of plots ; in the 
first, the seed for one year was from ears of corn in the previous 
year’s crop in that series, selected because it had high protein 
content, and in the second the seed for one year was selected from 
the previous year’s crop because of low protein content. The data 
given in Table 8.1 are the mean percentages of protein for each 
year, and the coefficients of variation per cent, from ear to ear, and 
we will call them ^1 and The mean protein percentages of 
the first series (y^) seem to have increased progressively, and those 
of the second series appear to have decreased, while it is difficult 
to see what tendency the variabilities show; we will investigate these 
trends by means of the correlation coefficients, thus assuming them 
to be sufficiently well expressed by a linear rela1;ionship with time. 
The correlation coefficients between time and y^^ y2 and V2 are 
+ 0*862, — o*o8i, —0*708 and +0*256 respectively.* For 27 
degrees of freedom, r = 0*367 lies on the 0*05 level of significance 
so that on the average the mean percentages of protein have increased 
significantly in the first series and decreased in the second with time, 
but the continued selection has had no measurable effect on the 
variability. 

Populations in which Association is not Zero 

8 . 2 . When the association is not zero, the standard error of the 

correlation coefficient is 

I — 

where p is the correlation coefficient in the population; but this is 
not of much use in testing whether one observed value is greater than 
another, because (a) unless p is small, the distftbution of r, the 
correlation coefficient of the sample, is far from normal, and (b) the 
population value (p) is unknown, and if the sample value (f) is 
substituted the error is likely to be serious. These considerations 

^ The computation is much simplified if the year 1910 is chosen as zero 
time, and the others are labelled — i, — 2, . . , + + 2, . . then o. 
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become particularly important when p approaches o*8, for the 
sampling curve becomes exceedingly skew; for evidence of this, the 
reader is referred to A Co-operative Study (1917), in which very full 
distributions are worked out by K. Pearson and his colleagues from 
a formula developed by Fisher (1915).^ 

These difficulties in the use of the sampling distribution of r may 
be overcome by making a transformation suggested by Fisher (1921), 
which consists in writing 

z' = tanh-1 r = i{log, (i + r) — log, (i — r)} . (8.3) 

This is not quite the same as the z found from the ratio of two 
variances, so we have added the dash to make a distinction. This 
quantity has a standard error of 


I 

(i.e. it is independent of the value of and although the distribution 
is not quite normal, it is nearly so, and for most tests, even on 
small samples, normality may be assumed. Fisher (1936) gives a 
table relating z' to r, and some of the values are entered in Table 7.7 ; 
when r = I *0, = 00 , and the whole effect of the transformation 

is to give a more open scale for z' when the association is high, 
when, as we saw in the last chapter, small changes in r correspond 
to important changes in the variance absorbed by the regression line. 

Significance of Differences between Correlations 
8 , 31 . Because its distribution is more nearly normal, Fisher recom- 
mends the z' transformation when testing the significance of the 
deviation of an observed correlation from zero ; but the effect of the 
transformation is small for small values of r, and this step is not 
important. Differences of observed correlations from some assumed 
population value, or between pairs of observed correlations, may be 

^ It may be well to mention here a fairly common fallacy in the use of the 
standard error for testing the significance of an observed coefficient, r. As 
an approxi mation, (i — y^)/ V N x is often written for the unknown 

py)l'v N — I and if any observed correlation is greater than twice the 
former standard error it is judged to be significant; but if the hypothesis 

being tested is that p = o, the tme standard error of r is i/ViV — i as 
shown in the previous section. 
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tested by using and its standard error just as we did the mean 
and its standard error in large samples. 

For example, Snow (19 ii) gives the correlation coefficient between 
brothers and sisters, measured for a variety of characters, as 0*521, 
while Fisher (1928) found from 34 pairs of unlike sex from triplet 
births the correlation coefficient for cubit to be 0*645; ^bis 

difference be attributed to random errors? The values of are 
0*578 and 0*767, and the standard error of the second is 

" 4 - — = -f- 0*180 
V 31 

(we assume the first determination to be the exact population 
value, since it is based on many more observations); the difference 
is less than twice the standard error and is insignificant. 

In this example, deviations of z' of ± 0*360 from the population 
value (0*578) lie on the 0*05 level of significance, giving values of 
z* on this level of 0*218 and 0*938; the corresponding values of r 
are 0*214 and 0*734. Thus the chances are about 20: i against a 
random sample of 34 pairs giving values of r outside these limits, 
but we regard values anywhere between them as common experience ; 
the deviations of the limits of r from 0*521 are — o • 307 and +0*213, 
and their inequality shows the skewness of its distribution. 

The following is an example of the use of z^ to test the significance 
of the difference between two observed correlations. Table 7.2 shows 
the correlation between number of stamens and pistils measured on 
373 flowers collected late in the season to be 0*745, while a sample 
of 268 flowers collected earlier gave a correlation coefficient of 0*506 ; 
making the transformation, we find 2' = 0*962 and 0*557, and the 
difference, being 0*405 with a standard error of 

/— + + = o-o8o, 

V 370 265 

is quite significant. 

The Combination of Estimates of a Correlation 
8 . 22 . It sometimes happens that we have a number of correlation 
coefficients estimated from several small samples from populations 
having the same coefficient, and that we wish to combine them to 
obtain a mean. This is best done by making the transformation, 

M 
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finding the mean and then re-transforming back to r. If the 
samples are not of equal size, we naturally give the larger ones more 
weight than the smaller ones; but the weights should be propor- 
tional, not to N, the size of the sample, but to (iV— 3). Thus if 
-s-g . . . are the individual estimates based on samples of • 

pairs of observations, the weighted mean is 

(iVi-3K + (iV2-3K+- ■ ■ 

(■^1 ~ 3) + (-^2 ~ 3) + • ■ ■ 

and the standard error of z' is 


I 

V(iVi-3),+(iv;-3)+. . ; 


Table 8.2 contains the correlation coefficients between cephalic 
index and upper face form for samples of skulls belonging to thirteen 
races (Tschepourkowsky, 1905). The values of are in the fourth 
column, and their mean, when weighted with {N — 3), and standard 
error are 


— 42*408 
69^““ 


— 0*060 9, 


and 



± 0*0379. 


The mean z' is not significant. 

If an improved estimate of the correlation is obtained by com- 
bining a large number of yery small samples in this w^ay, there is a 
possibility of serious error in z' due to the fact that its distribution is 
not quite normal and there is a slight bias. For this reason, the 
number of coefficients combined should be small compared with 
the average size of sample. 


The Significance of Groups of Correlations 

8 » 23 . If we have a group of correlation coefficients, however, and 
wish to see if, as a whole, they show association, a test involving 2:' 
and ^ may be used. Since the standard error of 2' is ij'\/N — 3, 
the quantity iV— 3 is distributed approximately normally and 

* In the language of section 3*7, iVi 3, iV'g — 3, etc., are the quantities of 
information given by the samples. In finding the mean of z\ these quantities 
are the weights, and since the quantities are additive, the variance of the 
mean is the inverse of the total quantity of information. 
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with unit standard deviation in samples from a population with zero 
association, and we may calculate 

where S is the summation over the number of samples available. 
As we have indicated in section 4.5, this should be distributed 
as the for g degrees of freedom if there are g samples, and we 
may find P, the probability that random samples from the 


TABLE S.Z 

COBRELATION COEFFICIEISTS BETWEEN CEPHALIC InDES AND 

Upper Face Index 


Race 

Number of 
Cases 

N 

Correlation 

Coefiadent 


- 3) 

Australians . . 


66 

4- 0-089 

4- 0*089 

0*50 

Negroes 

Duke of York 

Is- 

77 

4" 0*182 

+ 0*184 

2*51 

landers 


53 

— 0*093 

- 0*093 

0*43 

Malays 


60 

— 0*185 

■— 0*187 

1*99 

Fijians 


32 

4- 0*217 

0*221 

1*42 

Papuans 


39 

- 0*255 

— 0*261 

2*45 

Polynesians . . 


44 

4- 0*002 

4- 0*002 

0*00 

Alfourous 


19 

— 0*302 

— 0*312 

1*56 

Micronesians 


32 

— 0*251 

~ 0*257 

I ‘92 

Copts 


34 

- 0*147 

— 0 * 148 

0*68 

Etruscans 


47 1 

— 0*021 

— 0*021 


Europeans . . 


80 

— 0*198 

— 0*201 


Ancient Thebans 

». . 

152 

— 0*067 

*— 0*067 







X®=i 7-26 






^=13 


hypothetical population would give great or greater than that 
observed. 

When testing the correlations of Table 8.2 by finding the mean ar', 
we implicitly assumed them to be samples from the same popula- 
tion and regarded the variations in r as due to random errors. It 
may be,, however, that the variations are real racial differences, and 
that in general there is a correlation between cephalic index and 
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face form, which is sometimes positive and sometimes negative, so 
that the mean is nearly zero; we can now test this, using and 
The values of {N — 3) are given in the fifth column of the table, 
and their sum gives a ^ of 17-26, which for 13 degrees of freedom 
lies between the o-i and 0*2 levels of significance. Thus the com- 
bined' experience of Table 8.2 lends no support to the view that 
the two characters are associated, even after making allowance for 
the possibility of racial differences. 

Having found a mean z' from a number of samples by the method 
of section 8.23, and found it to be significant, we may wish to 


TABLE 8.3 

Correlations between Head Length and Breadth 


District 

Number of 
Cases 

N 

Correlation. 

Coefficient 




Alexandria. . 

643 

+0*244 

+0-249 

-'0*017 8 

0*20 

Cairo 

802 

+ 0*244 

+0-249 

—0*017 8 

0*25 

Canal 

127 ' 

+ 0*330 

+0-343 

+0*0762 

0*72 

Beheira 

526 

+0-213 

+0-2 i 6 

—0*050 8 

1*35 

Gharbiya . . 

I 104 

+0*316 

+0-327 

+o*o6o 2 

3*99 

Minufiya . . 

717 

+0-230 

+0-234 

—0*032 8 

0*77 


test if the individual values as a whole differ significantly among 
themselves, and the distribution may be used for this. If z' is an 
individual transformed correlation, z' the mean and N the size 
of the individual sample, 

X^ = S(z'~z')W-3), 

and the number of degrees of freedom (^) is one less than the number 
of samples (if the mean has been found from them). As an example, 
we. will consider Table 8.3, which contains correlations between 
head length and breadth for Egyptians native to six districts; the 
data have been taken from a paper by Orensteen (1920), The weighted 
mean ^ is + 0-266 8 (giving r = + 0-260), and the sum of the last 
column is = for 5 degrees of freedom, this lies almost 
exactly on the P = 0-2 level, and we must conclude that there is no 
evidence of any difference in correlation among the six districts 
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chosen. Comparing the Beheira and Gharbiya districts, we find the 
difference in is o- in with a standard error of 


± 



== ± 0*053; 


this difference is about twice the standard error, and if the two 
samples came alone, it might be regarded as significant; but we must 
remember that it is just about the most significant one that could 
be taken from the six values, and bearing in mind the remarks in 
section 3.33 regarding significances of differences within groups of 
samples, we should compare the ratio o* 111/0*053, not with 2*0, 
the 0*05 level for a random pair, but %vith 2*8, the 0*05 level of 
the range for six samples (Table 3.3.) 


Significances of Regbession Constants 

8 . 3 . We have seen in section 8.1 how the analysis of variance 
technique and the 2:- and t-tests may be used for testing the 
significance of a correlation coefficient; here we shall extend these 
methods to testing a number of hypotheses concerning regression 
coefficients and lines. The hypotheses will be given in italics at the 
head of the sections, 

8 . 31 . Hypothesis: That the population value of the regression coefficient 
is zero. 

This is the same as postulating a population with zero correlation, 
and the test of section 8.1 may be used. However, we shall express 
the equation in new terms. 

In the notation of the last chapter, let the regression line of the 
sample be 

and the standard deviation of the residual deviations from the line 
be s, where 

Then it is easy by means of equations (7.4) and (7.5) to transform 
equation (8.2) to 


VS{x -- # 

where s is estimated on N — 2 degrees of freedom. 



METHODS OF STATISTICS 


182 

Since t has been described in section 5.21 as the ratio of a quantity 
to its standard error, we may say that the standard error of the 
regression coefficient in samples from a population with zero asso- 
ciation is 

s 

V S{x — xY 

and the method of section 5.3 may be used to test any significance. 


8 . 32 . Hypothesis: That the population value of the regression coeffi- 
cient is QC, 

It is easy to show that the above expression for the standard error 
of the regression coefficient applies for any population value, a, so 
the test of the hypothesis is equivalent to using the j?-test to see if 
the ratio of ^ — a to its standard error is significant for N — z 
degrees of freedom. 

In large samples where N can be substituted for iV — 2, the 
standard error of the regression coefficient reduces to 



a 

—> 

P 


where p is the population value of the correlation coefficient. 


8 . 33 . Hypothesis: That a number of samples are from populations 
having the same regression coefficient. The residual variance of y is 
assumed to be constant. 

Consider the data of Table 84, taken from a paper by Glanville 
and Reid (1934). They are the crushing strengths of specimens of 
mortar and concrete made from seven different cements ; specimens 
were broken at three different ages. These data form part of the 
result of an investigation into the use of tests on mortar as a guide 
to the behaviour of the cement in concrete. If the corresponding 
strengths of mortar and cement are plotted a marked correlation 
will be apparent, and we may ask if the slopes of the regression 
lines of concrete strength on mortar strength for the three ages are 
significantly different. Whether or not the means for the three ages 
are different does not here concern us, nor the possibility of any 
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variations in the residual variance of concrete strength from one age 
to another. 

We may eliminate the mean strengths for the three ages by 
measuring the individual strengths as deviations from those means. 
Then we may either (i) fit one regression line to all the deviations, 
which is equivalent to fitting a series of parallel lines going through 
the means for the various ages, or (2) fit separate regression lines 
for the ages. The data are plotted in Fig. ii, where the parallel 

TABLE 8.4 


Crushing Strength of Mortar and Concrete, lb. per sq. in. — 10 


Cement 

One Day 

Sex^en Days 

Twenty-eight Days 

Mortar 

X 

Concrete 

y 

Mortar 

X 

Concrete 

y 

Mortar 

X 

Concrete 

y 

A 

263 

138 

750 

507 

89s 

679 

B 

493 

278 

936 

653 

I 066 

818 

C 

137 

49 

453 

425 

632 

651 

D 

477 

293 

893 

662 

I 100 

842 

E i 

233 

104 

545 

437 

716 

603 

F i 

568 

35 ° 

797 

73 S 

897 

832 

G 1 

230 

141 

631 

439 

846 

724 

Ss(x — XsY 

164 786 

193 074 

173 057 

Ss(y — ys)® 

76259 

99 933 

55482 

Ss(x — ics) (y — yj) 

III 205 

1 17 648 

82 646 


lines of system (i) are full lines and the separate ones of system (2) 
are dotted. The dotted lines fit the data more closely than the full 
ones in the sense that the sum of squares of deviations of concrete 
strength is less from the dotted lines. The separate regressions are 
only significantly different, however, if this difference in sums of 
squares is real after allowing for degrees of freedom absorbed in 
fitting and for random errors. This is tested by calculating residual 
variances. If the hypothesis is correct, the lines with separate slopes 
will give the same residual variance as those with the common slope, 
within the limits of sampling errors;* if the second residual variance 
is significantly less than the first, the hypothesis is rejected and it 
is inferred that the separate lines give a closer representation of the 

* This follows from the proof by Fisher (1925) mentioned in section 7*23. 
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data than the common line and are significantly different in slope. 
We shall set out the process algebraically. 

If there are u samples of iVi iSTj . . . individuals and we fit a separate 
regression line to each, viz.; 

(5^ — ji) = fliC* -»i). 

— ^2) “ ^2)’ 



Fig. II 


the residual sums of squares and corresponding degrees of freedom 
after fitting the lines are given by equation (7.92) as : 

c / - ^1) (3' - 

Sums of squares: Sfy — 


S4y - 




rv-F • A/l 


S^{x — 

[5'a(« - iy - ja)]^ 
Six - ^ 2)2 
etc., etc., 

2. A'o — 2. etc., 


(8.4) 
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where ^ are summations over all individuals in the respec- 

tive samples. The sum of these sums of squares divided by 
{Ni + iVa + • • • — 32 /), the sum of the degrees of freedom, pro- 
vides an estimate of the residual variance of y in the population, 
which is assumed to be the same for all samples. 

Now, assuming the samples to have one regression coefficient but 
different means of 3; for a given value of x, the residual sum of squares 
•from the regression lines with common slope, and the degrees of 
freedom are : 


Sum of squares: 

Siiy - + S4j - j?2)^ + . . . 

_ [■S'i(« - ^1) (y — j'l) + Sjx — (y — ya) + • . -T 
S^{x — ^1)^ + — x^)^ + . . . 

Degrees of freedom: N^-\- . . . — u — i. 


(8.41) 


The degrees of freedom are made up of the N — 1 contributed by 
each sample minus the one absorbed in fitting the common regression. 
The sum of squares divided by the degrees provides another estimate 
of the residual variance of y. 

We cannot test the difference between the two variances obtained 
from (8.4) and (8.41) for significance by the z test, because the 
variances are not independent. However, if we subtract the sum of 
squares of (8.4) from that of (8.41) and divide by the corresponding 
difference in degrees, we have an estimate of the variance that is 
independent of the estimate of (8.4}, and may be compared with it. 
The differences in sum of squares and degrees of freedom are : 


Sum of squares: 

- X2) (y - Ja)? 

Sfx — - xff Sf^x •— 

+ • • • • 

- «i) (y - ji) + Sfx ~x^{y — y^ + . . -f 
Sfx — + Sfx — ^2)® + • • • 

Degrees of freedom: u — i. 


(8.4s) 


Then z may be calculated from the ratio of the variances estimated 
from (8.42) and (8.4) and tested for significance for degrees of 
freedom, — = Ni + + — zu. 
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For the data of Table 8.4, the total sum of squares of concrete 
strengths from equation (8.4) is* 


76 259 ~ 


(ill zosf 

164786 


+ • • -==45 471 . 


r 

and there are 15 degrees of freedom, giving a variance of 3031. 
The sum of squares in equation (8.42) is 


164 786 


(311 499)" 
530 917 


3441. 


and for 2 degrees of freedom this gives as an estimate of variance 
the value i 720. This is not greater than the variance from (8.4), 
and the slopes of the lines are not significantly different. 


8 * 84 . Two samples are from populations having the same regression 
coefficient. 

The above test may be reduced to the ^-test when there are only 
two samples. The variance from (8.42) is, in this instance, based 
on one degree of freedom, the square root of its ratio to the variance 
from (8.4) is t. 

If the residual variance estimated from (8.4) is written and the 
regression coefficients of the two samples are % and it is easy to 
show from equations (7.5), (8.4) and (8.42) that 



S. / ^ j 1 

\ — S^{x — x^^ 

where s is estimated from iVj + — 4 degrees of freedom. 

This test is parallel to that given in section 5.3 for the difference 
between two means. It follows from this that the standard error of 
the difference between the two regression coefficients may be derived 
from those of the separate coefficients by the standard method, using 
equation (3.1), p. 72. 

As an example we may consider the seven and twenty-eight day 
results of Table 8.4. The residual variance s^ is obtained from 
equation (8.4) applied to the last two ages, and is 4 426 based on 
10 degrees of freedom. The standard error of the difference between 
the two regression coefficients is 

^ All results for this example are in units of 10 lb. per sq. in. 
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4436 {- 


V193 074 173 057. 

The regression coefHcients are 
117648 

-0-6093 and 


0 - 220 . 


82 646 


- = 0-4776. 


193 074 173 057 

Hence, f = o-6, and for lo degrees of freedom this is near the 
0*6 level; the difference in regression coefficients is not significant. 


8 . 35 . Hypothesis: That a number of samples are from populations 
having the same regression line. The residual variance of y is assumed 
to be constant. 

We are postulating here, not only that the true regression lines 
for the separate samples are parallel, but that they coincide. The 
true means and variances of the independent variable x need not 
be equal for the populations, so those oi y are not necessarily equal, 
but if our h5q)0thesis is true, the true mean value of y and the 
variance for any given value of x must be the same for all samples. 
This hypothesis is not quite the same as postulating that the samples 
come from the same population. Again, we are not considering the 
possibility of the residual variance of y changing from sample to 
sample, and so are not testing it. 

The general procedure is the same as that used in section 8.33. 
We fit the separate lines shown dotted in Fig. ii and estimate a 
residual sum of squares and variance from equation (8.4). The 
common line is shown in Fig. ii by a broken line; it is fitted to 
the deviations of the values from the grand means for all samples. 
If X and y are the grand means and S is the summation over all 
individuals, we have for the residuals from this line: 

Degrees of freedom: Ni + — 2. 

The difference between the sums of squares in (8.43) and (8.4) 
divided by the difference in degrees of freedom, 2u — 2, is the 
independent estimate of variance that may be compared with the 
residual variance from (8.4). There is no further algebraic reduction 
of the expressions. 

We shall test the hypothesis on the data of Table 8.4 analysing 
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the variance of the concrete strengths. The snna of squares of 
deviations from the common line is iii 792, the sum of squares 
of deviations from the separate lines is 45 471 and the difference is 
66 321 for 19 ~ 15 = 4 degrees, giving as an estimate of variance 
the value 16 580. The residual variance based on 15 degrees is 3 031, 
and*" since this is significantly less than the estimate based on four 
degrees (^ = 0-85 lies beyond the o-oi point), we conclude that 
the regression lines differ. In this instance, since we have shown in 
section 8.33 that the lines do not differ significantly in slope, we 
may say that they differ only in level. 

To test for a difference in level, assuming the slope to be constant, 
it would be legitimate to compare the residual after fitting a number 
of lines having different means but the same slope, with the residual 
after fitting a common line. Such a test is used in section ii.ii. 

Effect of Non’-uniformity of Variance 

8 . 36 . The tests for these hypotheses are based on the assumption 
that the residual variances are the same for the populations sampled. 
It is probable that the tests are appreciably affected if the residual 
variances change from one of the populations to another by more 
than a small amount. Where heterogeneity of variance is suspected, 
and a result of importance is very near the level of significance, it is 
well to test this assumption by the methods of sections 5.4 and 5.5. 

For example, the separate residual variances for concrete of the 
three ages in Table 8.4 are 243, 5 649 and 3 203, each being based 
on 5 degrees, and the first is significantly less than the other two. 
In consequence, some little doubt may be thrown on the inferences 
of the sections 8.33 and 8.35. We do not propose attempting to 
extend the statistical analysis to deal with such instances, and if 
the matter is of importance suggest either that more data should 
be procured or that here the hypotheses could be tested on the 
7- and 28-day values only. 

Sheppard’s Corrections 

8 . 4 . In calculating correlation coefficients from samples of grouped 
data, Sheppard’s corrections are often used in determining and 
Qy, and they tend to increase the value of r. For this reason it is 
better to ignore them in making tests of significance, particularly 
when the grouping is broad, although their use probably leads to an 
estimate of r which is nearer the population value. 



CHAPTER IX 


NON-LINEAR REGRESSION AND THE MEASUREMENT 
OF ASSOCIATION 

The Correlation Ratio 

9.1. We have seen that the correlation coefBcient measures the 
proportion of total variance of a quantity associated with a straight 
regression line, but if the true regression is not linear, i.e. if the 
array means of one quality tend to lie on a curve which deviates 



Fig. 12. 

sensibly from a straight line, the correlation coeflScient may be a verj 
inadequate representation of the true state of affairs. It is possible 
to imagine a case in which the two regression curves are somewhat 
like those shown in Fig. 12 , one being a straight line and the other 
a curve more or less symmetrical about a vertical axis. The correla- 
tion coefficient would be nearly zero (for the best straight line that 
could be drawn to fit the curve would be parallel to the x-axis), 
although there might be a high association as expressed by the curve ; 
we are not aware of any such extreme case, but the same considera- 
tions apply when the curvature is less marked. In such instances, the 
only thing to do is to present the correlation table, to give the array 
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means and perhaps to fit a curve to them; methods of performing 
this last operation will be discussed later. Professor K. Pearson (1905) 
has proposed a constant called the correlation ratio as a measure of 
association analogous to the correlation coefficient, and suitable 
for use when the regression is not linear. If ^ is an individual observa- 
tion of one variate, x the grand mean and x^ the mean of any individual 
array, the correlation ratio of x on y is where 


S{x — x)^ 

and the correlation ratio of y on x is where 

'>!. S(y-yf 


( 9 - 1 ) 


.S' being the summation over all individuals. If we now refer back 
to Table 6.3, p. 130, we see that 




Sum of squares of deviations from array means of x 
Total sum of squares 


and thus rj is analogous to the correlation coefficient r, since from 
Table 7.61, p. 158, we see that ^ 


(i - r^) 


Sum of squares of deviations from linear regression 
Total sum of squares 


Hence is the proportion of the total sum of squares of x associated 
with 3; when a straight line is fitted, and rf^ is the proportion associated 
when a series of array means or curve is fitted ; just as there are two 
regression lines (for which r happens to be the same), there are two 
series of array means which require separate correlation ratios. If 
the sample is large compared with the number of arrays and we may 
neglect the degrees of freedom absorbed, and if is the standard 
deviation of x^ and o*' that of the residual deviations, we have as an 
approximation, 

for linear regression = (i 

for curved regression == (i — 

y 

In large samples, therefore, may be interpreted as a measure of 
association on practically the same scale as r. We have, however, 
that the number of arrays shall be small compared 
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with the size of the sample (say one-fiftieth). The usefulness of this 
ratio is increased by the fact that the quantity 37 does not enter into 
the expression for , so that the second variate need not even be 
quantitative; we have thus a measure of association which may be 
used in such analyses as those of Tables 6.1 and 6.5, pp. izy and 133. 
From those tables we see that the correlation ratios are : 

Number of ovules per ovary, 


I 


,2 _ 3 026-350 ^ 

5 379-775’ 


0-437. V = 0-66, 


Length of cuckoos’ eggs, 


2 3356-21 

I — -30- = , 

3429-70 


0-313, 7J=0-56, 


and the strength of association in the former case comes somewhere 
between those of the pistils and stamens of the late and early flowers 
(Tables 7.2 and 7.3, pp. 145 and 146), while in the second case it 
is practically the same as the relationship between the pistils and 
stamens for early flowers. 

One disadvantage of the correlation ratio, however, is that it is not 
independent of the number of groups from which it is calculated 
— ^that it is not entirely a property of the population nor even of 
the sample, but depends on its arrangement into arrays. If there are 
m arrays each with n observations (so that the total in the sample 
is JV = mn), if and are the squares of the standard deviation 
of array means and residual deviations in the infinite population, and 
if V, and are the corresponding variances, we see from Table 6.3, 
p. 130, that 


{m — iK 




m{n — Ti)Vr (m — i)©j 




N- 


m 


(9-2) 


m ■ 


and this shows that 17® depends on n and m. If the number in each 
array (w) is so large that a? may be neglected in comparison with 
ml, and n may be -written for (re — i), equation (9.2) may be written 


I -4- 


m 


m — I 


. . (9.21) 
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and the larger m is, the smaller is ml(m — i), or the larger is rf. 
Thus it is important to keep the number of groups small so as to 
avoid over-emphasising the association. 

Since is related to a ratio of variances, it may be used instead 
of JS' as a test for significance of association. However, such a test 
is npt convenient and is seldom used, and we shall not deal with it 
further. 

Tests for Non-Linearity of Regression 
9 . 2 . We shall consider the data of Table 9.1 (Zinn, 1923), relating 
the loaf volume and percentage protein in a number of Ohio com- 
mercial wheats. The interest lies in predicting the loaf volume from 
the protein content, and so we consider the regression of the former 
on the latter. The correlation coefficient (found without Sheppard's 
corrections) is + 0*529 78, showing that the loaf volume does, on 
the whole, increase with the protein content. If the table is examined, 
however, there seems to be a tendency for the loaf volume to remain 
practically constant for wheats of low protein content, and for the 
relationship between the two quantities to be more marked for 
wheats with more protein, i.e. for the regression to be non-linear. 
Is this apparent tendency real, or is it due to errors of random 
sampling? This section is concerned with tests of significance of 
such departures from linearity. 

The general method is based on the analysis of variance. If the 
regression is non-linear, it will be almost completely expressed by 
the mean values of loaf volume for the separate arrays, aiid the 
residual variance calculated by the methods of section 6.4 will be 
as low as is possible. A straight regression line fitted to the same 
data will have a significantly greater residual variance, because the 
residual will contain variance due to the deviations from linearity. 
On the other hand, if the regression is linear, it may easily be shown 
that these two residual variances will be estimates of the same 
variance. The test for non-linearity therefore resolves itself into a 
test of significance of the difference between the two estimates. 
However, they are not independent, and so cannot be compared 
directly by the ar-test, but the difference between the sums of squares 
divided by the difference in degrees of freedom is independent of 
the residual variance of deviations from the array means, and these 
two estimates of variance may be compared. 

The sums of squares and degrees of freedom for a sample of 
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N observations divided into m arrays, with tIs observations in the 
^th array is given in Table 9.2, where all the symbols have the 
meanings given to them in Chapters VI and VII. All the sums of 
squares given directly are taken from Tables 6.3 (writing y for oc 
and adapting it for non-uniform array totals) and 7.6. It is recom- 
mended that the sum of squares in the second row should be obtained 
by subtraction. 

TABLE 9.3 

Analysis of Variance of Loaf Volumes 


(Units, 10 000 c.c.®) 


Source of Variation 

Sums of Squares 

Degrees of 
Freedom 

Mean Variance 

Linear regres- 
sion . . . . ri ^ 

. Between 

Deviations from 

arrays 

linear regres- 
sion . . . . j 

Residual within arrays , . 

64*66 

>78‘ai 

13*55] 

15^*15 1 

■ 

64-66 

1*94 

1*67 

Total 

230*36 

99 

— 


The analysis for the data of Table 9.1 is in Table 9.3. The 
variances to be compared give a value of z of 

and since this is well below the 5 per cent, level of significance, 
there is no evidence that the regression departs from linearity. 

It will be noticed that the distribution of protein content is far 
from normal, but that does not invalidate our test, which depends 
only on the assumption that the distribution of loaf volume within 
the arrays is practically normal. As far as can be judged from a 
sample of this size, this assumption is justified. 

Polynomial Regression Curves 

9 . 3 . Sometimes it is convenient to represent the regression of j on Jtr 
by a smooth curve described by an equation with several constants 
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that have to be determined from the data. From the statement in 
the footnote in section 7.23 it will be appreciated that such an 
equation having only a few constants may absorb fewer degrees of 
freedom than a number of array means, leaving more degrees for 
the estimation of the residual. When there are only a few observa- 
tions, they cannot be grouped in arrays and the use of an equation 
may be essential in the analysis of variance. 

The most useful form of equation is the polynomial 

Y = a bx cx^ (9.3) 

and according to the method of least squares the constants a, b, c, 
etc., may be obtained by solving a series of linear equations 

Sy = aSi -f bSx -f cSx^ -f- . . . 1 

Sxy = aSx + bSx^ -f cSx^ + . . - [ • • (9.4) 

Sx‘^ = aSx‘^ + bS:^ -f- -f . . . J 

using as many equations as there are constants. The summations 
are taken over all individuals and may be determined by fairly 
straightforward extensions of the methods of sections 1.3 and 7.5. 
Si is N, Sx and Sy are N times the respective means and the other 
sums are N times various moments and product moments measured 
about the origin. Any convenient arbitrary origin may be used in 
finding the summations provided the same origin is used in express- 
ing the result in (9.3). If the data are grouped into arrays, the 
array means y^ may be used instead of the individual values, and 
the summations are then Snj^, Sn,, Snyc„ etc., using the same 
notation as before. A similar series of equations obtained by inter- 
changing « and y in equations (9.3) and (9.4) gives the regression 
of X on y. 

The polynomial equation is very adaptable, and according to the 
values of the constants a, b, c, etc., it may take a wide variety of 
forms with only a comparatively few constants. Indeed, for practical 
purposes it is a suitable way of expressing nearly all kinds of regres- 
sion except those in which the curve of y undulates and has more 
than two or three maxima and minima. 

In the analysis of variance given in section 9.2, a pol3raomial 
equation may be used instead of the array means, and it absorbs 
as many degrees of freedom as there are constants, the constant a 
accounting for the one degree previously attributed to the grand 
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mean of y. Curves of the first, second, third and higher orders may 
be fitted successively if desired, and as more and more terms are 
used, the equation fits the observations more and more closely, until 
ultimately an equation with as many constants as there are observa- 
tions does so perfectly. This progressive improvement in the closeness 
of may be seen by finding the deviations of the observed values 
from those given by the regression and squaring and adding them, 
i.e. by finding S(y ~~ Y)^. As more and more terms are used this 
sum of squares decreases until when the fit becomes perfect, it 
becomes zero. If the constants are determined by the method of 
least squares and the fluctuations in j are purely random, the reduc- 
tion in sum of squares at each stage tends to be the same; it is the 
variance associated with one degree of freedom, differing from the 
true variance only by the extent of the random errors. If the data 
show a comparatively simple form of variation, predominating over 
the random fluctuations, the earlier terms of the equation, which 
express the simpler movements, reduce the sum of squares by an 
amount significantly greater than the later ones. When sufficient terms 
are included to express this slow movement of y with x, and the 
residual deviations are random, each of the remaining possible terms 
reduces the remaining sum of squares by the same amount, within 
the limits of random errors. Thus, at each stage in fitting equations 
with more terms, the reduction in residual sum of squares due to 
one degree of freedom can be compared with the variance estimated 
from the residual; if the former is significantly greater than the 
latter, the last term expresses a real feature of the trend. When the 
stage is reached that the variances associated with the last one or 
two terms are not significantly greater than that estimated from the 
residual, it is presumed that the regression has been adequately 
expressed and the deviations are random. This presumption is only 
justified, however, if the polynomial is the form appropriate for 
describing the regression. This technique of fitting and testing curves 
should not be followed blindly, and a certain amount of judgment 
and reference to a diagram may be necessary; it is possible that the 
first term or two may account for very little of the variance, but 
that some of the later ones may be very important; this would be 
the case in the curve of Fig. iz, where the linear term would be 
very small but the second one would be important. On the other 
hand, if a very large number of terms is required, it may be because 
nnlvnomial form is not suitable. A regression equation fitted to 
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any data must not only satisfy these statistical tests, but it must 
appear appropriate when plotted on a diagram. 

Every time an equation with an additional constant is tried, 
equations (9.4) have to be solved afresh and new values of all the 
constants obtained. Thus, a third order cuiv'e requires different 
values of a and b from those appropriate to the second order; The 
equation (9.3) may be expressed in a so-called orthogonal form con- 
sisting of a number of terms of successively higher orders in each 
term being multiplied by a constant to be determined from the 
data. These terms are independent in that they may be determined 
one at a time, and the addition of the higher orders does not alter 
the constants of the lower orders previously determined. This course 
is easiest when the values of the independent variable are at equal 
intervals and there is one value of the dependent for each value of 
the independent variable, as in a time series. Fisher (1936) describes 
a convenient system for doing this, and we shall illustrate it by 
fitting a curve of the third degree to the protein content data of 
Table 8.1. 


9 . 31 . We shall call the mean protein content (the observed values) 
y, and the year measured from the middle year 1910, / (i.e. for 1908, 

^ — 2), and will fit a curve of the form, 

Y^a + bt + ct^ + df^ . . . . (9.5) 

to the 29 values {N =29).* 

This Fisher transforms to 

y=.a + bt^ + ct^ + dt^, 

where 

~ i) = i (since t = o), 

1 

^2 -- 70, 

X2 

and 

— 125-8^ 

® 20 

* If there is an even number of years, * = o must still be at the centre, 
so that the values of t must be + 0-5, + i-S. - • • — O'Si — I'S • ■ 
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The constants are given by the relations 


c= 

D = 


z 030 
xSyT, 


Syt, 


N{m - i)(iV2 - 4) " ‘ 1 13 274 


— 70^y}, 


2 800 


N{N^ - i){m - 4)(iV2 - 

5 


30 392 704 


{iSyt® — i 25 ' 8 iS^t}. 


(9-7) 


The summations Sy, Syt, etc., have to be found from the data, and 
Fisher gives a convenient method of evaluating them by a series of 
additions, but for such a small number of observations as we have, 
it is not very much trouble to obtain the summations directly, 
particularly if a calculating machine is available. For the protein 
percentages they are, 


iSji = 411-48 
Sy^t— + 342-98 
Sy-ifi — 28 408-36 
Sy{l?= +49329-62 


Sy^, = 240-78 
/Svat = — 195-36 
17674-16 

Sy^t^ = — 26 716-68. 


If iJu aJu sJXi - • - 29d’i individual readings, 

Syit = (29^1-1^1) . 14 +G 3 yi- 2 Ji) . 13 + • • - +(i 6 yi--i 4 Ji) • I. 

Syx^^ = - I4®+(28Ji+23'i) - 13 ®+ • - • +(l6yi+I4>’l) • 

Syxfi = ( 29 >’i-iji) • i 4 ®+( 28 :yi- 23 'i) - 13®+ - • • +(18^1-14^1) • 1®. 

' etc. etc. 


when the values of t are symmetrically placed about the zero ; the 
labour is diminished if the terms of the brackets are foimd first, 
giving two series of fourteen values, one for use when multiplying 
by the odd powers of and the other when multiplying by the even 
powers. 

Substituting the above values in (9.7) we obtain, 


Ax 


411-48 

^9 


14-188 97, 


A^ 


240-78 


29 


8-30276, 
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B,= 


C^ 


342-98 

2 030 

- 395-24 

1 13 274 

Z). 


a 


o- 168 96, Sj 

— 0-0034892, Cj 

5x6 182-736 _ 

30 292 704 

■ — 5 xz 140-392 

30 292 704 


- 195-36 

z 030 
819-56 


■ 0*096 24 


113274 


= 0-007 2352, 


O-OOI 020 5, 

= 0-000 3533. 


These constants can be substituted in equation (9.6). 

It is not necessary to calculate the residual deviations explicitly, 
for the succesive terms reduce the sums of squares by the following 
amounts : 


NA\ 


12 ' 


180 


N{N^ i) ^ 4) {m - 9) 
2 800 


etc. 


The first of these has been encountered before, for it is the ordinary 
way of correcting a sum of squares of deviations from some origin 
to a sum of squares of deviations from the mean ; 


NA^^Nf and S{y - yf Sy^ -- Nf, 


and the other terms may be regarded as correcting to a moving mean, 
the first to a mean which gradually increases (or diminishes) according 
to a straight line law, for 


^ ^s(y - 
12 V/ 


and the others to a mean the movements of which become more 
complicated as more and higher terms in the regression are used. 

We are only concerned with the analysis of stuns of squares of 
deviations from the means, and these are given in the “Total” row 
of Table 9.4 for the two series of protein contents, while the 
variances of the terms in the regression are given in higher rows. 
The variances of the second and third terms are both greater than 
the residuals, and we may test their significance by finding 
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and seeing if they are above the 0*05 level when i and == 25, 
or alternatively by using the result of section 6.5 and finding 





and ’resting their significance for 25 degrees of freedom. Fisher’s 
tables of t give 2*060 lying on the 0*05 level of significance, so that 
the first order terms are both real (we arrived at this conclusion in 


TABLE 9.4 
Analysis of Variance 


Source of Variation 

Degrees of 
Freedom 

Series i 

Series 2 

Sum of 
Squares 

Variance 

Sum of 
Squares 

Variance 

First order regres- 






sion 

I 

57-95 

57-95 

18*80 

18*80 

Second order re- 






gression 

I 

1*38 

1-38 

5*93 

5‘93 

Third order re- 






gression . . ' 

I 

6-31 

6-31 

0*76 

0*76 

Residual . . 

25 

12-34 

0-494 

12’ 06 

1 

0*482 

Total 

28 

77-98 

— ; 

37-55 

— 


section 8.1 when finding the correlation coefficient), and so are the 
third order terms of the first series and the second order terms of 
the second. 

^ To test whether the cubic regression line as a whole gives a better 
fit than a linear one, we find 


3; = I log. 


3-84 

0-494 


for the first series and 


1-025 


for the second series, and the degrees of freedom are = 2, = 25. 

Fisher gives for such a sample the i per cent, point of z as 0-858 5, 
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so the cubic regression is significantly different from the linear one 
in both instances. 

It may be argued that there is no reason why we should stop at 
the third order term, but Fig. 13 shows that the data are quite well 
followed by the cubic curves, and from Table 9.4 we see that 
the residual variances for the two series are practically equal, 



suggesting that they may both be a result of the same random 
causes. We shall use these data later, and assume that cubic equations 
eliminate sufficiently the systematic trend in protein content- 


9 . 32 . There are forms of equation other than the polynomial that 
may be appropriate in some instances; for example, equations 
involving trigonometric functions are suitable for expressing many 
types of periodic fluctuation with time. For smoothing economic 
time series, many special methods have been devised, and are often 
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favoured either because they describe complicated movements more 
readily than a simple polynomial does or because they are a mathe- 
matical expression of some economic generalisation. We cannot deal 
with these here, but would remark that many of these curves cannot 
be fitted by least squares or some equivalent method, so it is not 
possible to say how many degrees of freedom are absorbed, and the 
criteria of fit developed in this chapter may not be applied. In the 
absence of alternative criteria, this may be a disadvantage, although 
on the other hand it may perhaps be argued that in the absence of 
criteria for testing whether a suitable form of equation has been 
chosen, it is idle to .place too much emphasis on criteria for testing 
whether the number of terms in the equation is appropriate. 

9 . 4 . It may be well to emphasise the fact that since all the tests of 
the preceding sections are extensions of the analysis of variance, 
they are based on the assumptions, discussed in section 6.7, of 
normality and homogeneity of the residual variations. 

Contingency 

9 . 5 . In section 4.4 we have described the formation of contingency 
tables when the two characters can only be expressed in broad 
qualitative groups, and how the existence of association may be tested 
by calculating but we have not given any method of expressing 
the strength of association. The best constant is probably Pearson’s 
(1904) mean square contingency coefficient. 

Consider the ^th row and column in a contingency table, let 
the number of individuals in the cell common to both be the 
total in the rth row be , that in the ^th column be and the 
grand total be N, Then the expected frequency in the cell is 
n^nJN, and we have seen that 
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and the mean square contingency coefficient as 



This latter constant was chosen because as the grouping becomes 
very fine (i.e. as the number of cells becomes large), Ci approaches 
the value of the correlation coefHcient r, if the population is normal. 
Unfortunately, like the correlation ratio, Q depends on the form 
and fineness of grouping, but it gives an appreciation of the strength 
after the reality of an association has been established by the 
method of Chapter IV, and does not depend on the order in which 
the groups are arranged. The contingency^ coefficient is not reliable 
when there are fewer than about sixteen cells, and, as in calcu- 
lating with very few entries should be avoided as far as 

possible; the constant is thus only suitable for large samples. Subject 
to this precaution, the finer the grouping the better, except that the 
computation becomes very laborious with a very large number of 
cells. For the smallpox data of Table 4.5, 

so 

0*1x6 z and Cj— /?~li^===:o-32. 

\l i’ii 6 2 

It would be instructive to combine groups of Tables 
various ways so as to form contingency tables with different numbers 
of cells and thus to see how the contingency coefficients approach 
the correlation coefficients. 

The equivalence of the constants of this section to the correlation 
coefficient depends on the assumption that the variates are distributed 
normally, but that is no real limitation, for when they can only be 
expressed qualitatively we may reasonably imagine a quantitative 
description on such a scale that its distribution is normal. 



CHAPTER X 


THE FURTHER THEORY OF ERRORS AND PRINCIPLES 
OF EXPERIMENTAL ARRANGEMENT 


Theory of Errors 

10 . 1 * The sampling distributions used in the applications of the 
theory of errors in Chapters II to V have all been based on the 
assumption that the individuals in a sample are independent. When, 
however, the variation is heterogeneous and the extent to which the 
sources of variation are represented is not left to chance, the indi- 
viduals are not independent and the simple theory does not apply. 
This condition arises in Table 6.i (p. 127) where each shrub is 
represented equally and the ovaries from one shrub are related. 
The same general theory of statistical inference applies, since it 
does not depend on any assumptions of independence, but the 
sampling distributions and standard errors have to be modified; 
we shall discuss these modifications and some of their consequences 
in the next three sub-sections. 


Random Sampling from Unlimited Field 

Considering the sample of ovaries of Table 6.1 (p. 127) as 
representing an infinite population of shrubs, let us estimate the 
standard error of the mean. Using the simple theory, we should 
have estimated the standard deviation from the distribution in the 
“totals’’ column, and from Table 6.2 we see that this gives as an 
estimate of standard error the value 


i 


V 


5-385 

I 000 


±0-073 4- 


We know this to be incorrect, since the ovaries are not independent. 
However, the 10 shrub-means are independent and may be regarded 
as a small random sample from the infinite population of shrub- 
means. From Table 6.2, the variance between shrub-means after 
allowing for weighting is 2-614 92 and the standard error based on 
9 degrees of freedom is 


± 



2*614 92 
10 


± 0*511 4. 
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Subject to the limitations of a small sample, this estimate is correct, 
and is about seven times the incorrect estimate. 

It may be seen how this difference arises when allowance is made 
for the fact that the standard error of the mean is made up of two 
parts, one due to the true variance between shrubs, and the 
other due to the variance within a shrub, If there are m shfubs 
and n ovaries per shrub, the first source of variation is only repre- 
sented by m individuals and the second by mn, and therefore W'e have 


Standard Error of Mean = 



(lO.l) 


and from equation (6.3), p. 129, we see that the best estimate of 
this is V ^Jnm, The simple theory assumes that both sources of 
variation are sampled independently mn times. If we took i 000 
ovaries one at a time from the population of shrubs, leaving it to 
chance from which shrubs they came, this condition would be 
satisfied and we should have 


Standard Error of Mean = 


V 



mn 


(10.2) 


If the estimates of and a, ^ given in section 6.2 are substituted 
in (10.2) the result differs only slightly from the first estimate of 
the standard error given in this chapter. 


Complex samples of the kind exemplified by the i ooo ovaries, 
with 100 from each shrub, are not uncommon, and standard errors 
may easily be estimated if the sample can be divided into a number 
of equal, independent sub-samples or units like the shrubs. For 
example, the 500 pairs of brothers mentioned in section 3.1 are 
500 independent units to which the sampling theory applies. I'o 
find the standard error of the mean character of the sample of cotton 
hairs collected in tufts as described in section 3.1, it would be 
necessary to measure the mean character separately for each of the 
reduced tufts, say 64 in number, and then calculate the standard 
error of the grand mean of these 64 tuft means. Sometimes, the 
sub-samples themselves are not independent, but have to be grouped 
into yet larger units, and there may be a whole hierarchy of units, 
only the largest of which are independent. Thus, eight tufts of 100 
cotton hairs may be taken from each of eight bales to sample a 
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consignment of cotton, and since only the bales are independent, 
we must reduce our 6 400 hairs to a small sample of eight bale means. 

The same considerations apply to means of observations in a 
time (or space) series, say to the mean annual rainfall obtained from 
the records for a number of years. The total variations are due to 
resi^^ual deviations and to a gradual trend, which may perhaps be 
expressible by a polynomial equation with a few constants ; the trend 
corresponds to the shrub or tuft variations in the above examples, 
and is not sampled as many times as there are years. Indeed, it is 
difScult to say how many times it is sampled, and exactly what 
contribution it makes to the standard error of the mean, and we 
can only be sure that if the secular trend is at all important, the 
simple standard error given by ajV iV is a serious underestimate 
of the true one. 

The situation may be described in another way by stating that 
when the variation is heterogeneous, and the individuals are not 
independent, the variations between the individuals in a sample are 
not of the same kind as those between samples. Consequently, the 
simple sampling theory which estimates the variance between 
samples from that within a sample, cannot apply. This general 
situation may arise, in several ways. For example, a laboratory that 
conducts routine determinations of the chemical or biological 
qualities of materials may attempt to estimate its errors by doing 
replicate tests. If the replicates are always done in parallel, going 
through the various preparations and treatments together and on 
the same day, they are not subject to the same errors as replicates 
done on different days and subjected to processes of determination 
that are independent from the beginning. Consequently, standard 
errors calculated from the parallel replicates are not valid estimates 
of the error in the comparisons of independent determinations made 
on different materials on different days. It is essential to ensure 
that the replicates on one material should be subject to the same 
variations as the determinations of the different materials to be 
compared. It may be suggested here that where much routine 
sampling or testing of a particular kind is done, replicate independent 
samples or tests of a number of populations or materials may provide 
a good combined estimate of the error of that kind of determination. 
For example, the replicates may be duplicate determinations, and 
the variance may be estimated from the equation for pairs in 
r T. Before doing* this, however, it is well to conduct an 



PRINCIPLES OF EXPERIMENTAL ARRANGEMENT 207 

investigation to see that the estimates of variance provided by the 
separate sets of replicates do not differ among themselves by an 
amount greater (or less) than may be attributed to random errors. 
This may be done by Neyman and Pearson’s test, section 5.5, if 
there are only a few sets of replicates; if there are many sets, an 
approximate test is provided by seeing that the variance of^the 
separate estimates of variance is not significantly greater than zz^iN 
(section 3.55), where v is the mean variance for all sets and N is 
the number of replicates per set. Another line of investigation is to 
test for the existence of a correlation betw’een the variance and mean 
quality of each set. 

Economy in Sampling 

10 . 12 * When the variation in a population is heterogeneous, the 
arrangement may have an influence on the precision attainable with 
a given size of sample. We shall discuss here the best way of dis- 
tributing observations between and within groups w^hen there are 
those two simple sources of variation. 

If we use the notation of the previous section and let the total 
number of observations, mn, be N, the standard error of the mean 
given by equation lo.i reduces to 

For a given number of individuals, iV, this is least when n is smallest, 
i.e. when w = i. That is to say, unless there are technical objections, 
for a constant number of observations the maximum accuracy in the 
mean is obtained when each one is selected separately and indi- 
vidually from a separate group ; and in that case the standard error 
found in the ordinary way from the total variance is applicable. The 
reader will agree that this conclusion is merely common sense. 

However, technical difEculties often intervene; for instance, we 
have mentioned in section 4.1 that in selecting cotton hairs indi- 
vidually there is a tendency to prefer the long ones, and to overcome 
this, hairs have to be taken in tufts. 

Another modification is necessary when it takes more time to 
increase the number of groups than to increase the number of 
observations within a group. Smith and Prentice (1929), in obtaining 
soil samples for coxmts of cysts, took a number of ‘‘borings*^ of soil, 
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and then made several counts on each boring; if there were no other 
considerations, it would be best to take many borings and to make 
one count on each, but we must take account of the fact that it may 
take more time to obtain a boring than to make a count on one. If 
there are n counts on each of m borings, o-^^is the variance between 
borings in the infinite population, that of counts within a boring, 
t the time to make a count and kt the time to make a boring, we have 
for the total time to obtain the N = 7 nn counts, 


T^mnt + mkt; . , * , 

and the square of the standard error of the grand mean, 






m 


mn 


(10.3) 

(10.31) 


Now from (10.3), 

— mky 
t 


and substituting this in (10.31) we obtain 


2 ’-'s I 


m 



In order to find the best way of distributing a constant time T 
between taking borings and making counts, we must find the value 
of m which makes a minimum. Performing this operation in the 
usual way by making 



and substituting, we obtain 

2 

2 (10-32) 

Smith and Prentice found estimates for cr^ and to be 40*5 and 
23 • I per cent, of the mean, and if we assume it takes five times as 
long to take a boring as to make a count, we find from (10.32) that 
the time is best employed when w = 1*3 ; i.e. under such circum- 
stances it would be better to increase the number of borings than to 
take more than two counts on each one. Examples of this sort arise 
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in other connections, and equation (10.32), which is only intended 
to give a rough guide, may be found useful. Similar considerations 
also arise in estimating other constants. 

Sampling from Limited Field 

10 . 13 . In considering the sampling theory connected with the data 
of Table 6.1, we have regarded the ten shrubs as a random sample 
from an infinite population of shrubs. Sometimes, however, there 
may be only ten shrubs in the population under investigation, 
i.e. the field of sampling may be limited, although an infinite popu- 
lation of ovaries may be postulated. Then there are two possible 
methods of sampling. 

In the first, the method of unrestricted random sampling, ovaries 
are taken entirely at random without any regard being paid to the 
shrub from which they come. Then, in a finite sample, the numbers 
of ovaries from the several shrubs are not necessarily the same and 
in general shrub variations contribute something to the sampling 
errors. The “totals” column of Table 6.1 represents this population 
of indiscriminately mixed ovaries from the ten shrubs, and the 
variance estimated from this provides the standard deviation from 
which the standard error of the mean has already been estimated 
as i 0*073 4. The equation for this standard error is equation (10.2), 
except that when a^^is obtained from the sample as in section 6.2 
it should be multiplied by {m — i)/m; if the number of shrubs is 
limited, is not estimated^ it is determined. In the subsequent 
discussion we shall neglect this correction to equation (10.2). 

The second method is termed by Bowley (1926) selection in strata 
and it consists in dividing the population into a number of parts or 
strata and in taking a sample of individuals at random except that 
the number taken from each stratum is proportional to the number 
present. In our example the ten shrubs are the strata and they may 
be presumed to be approximately equal in size. Then since each 
shrub is equally represented in the sample, as far as shrub variations 
are concerned the sample is an exact representation and sampling 
errors are due entirely to within-shrub variations. The within-shrub 
variance is 3*057, so the standard error of the mean of N ovaries 
sampled in strata is 

orgeneraUy, - 


o 
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In the language of section 3.7, the method of selection by strata is 

0-5^ 

100 — per cent, 
a/ 

more efficient than purely random selection ; for the data of Table 6.1 
this®difference in efficiency is 76 per cent. 

The method of sampling by strata has a wide application since 
most populations are limited in extent, even though they may have 
so many individuals that the conception of an infinite population is 
reasonable. For example, an agricultural plot can be divided into 
areas for sampling ears of com, say; a town can be sampled in 
, wards and streets in a sociological survey as suggested in section 3.1; 
or the products of a factory can be divided into strata representing 
the output of different machines, operatives or periods of time. The 
degree of superiority of sampling by strata to simple random 
sampling depends on the variance between — compared with that 
within — ^the strata, and there is an art in choosing the lines of 
division so as to make the ratio as large as possible. A knowledge 
of the sources of variation and of their relative importance such as 
that given by an investigation using the analysis of variance and by 
a priori technical knowledge is of great assistance. When there are 
no real variations between the strata, i.e. when = o, sampling 
by strata is at its worst, and even then it is as good as random 
sampling. It should be noted that unless there are at least two 
individuals from each stratum, cr^^, and hence the precision of the 
sample, cannot be estimated. 

The Principles of Experimental Arrangement 
10 . 2 . All the foregoing considerations apply to the determination 
of a mean; but quite often we are interested, not in the absolute 
value of the mean, but in the extent to which the mean changes with 
some change in conditions “(varied, perhaps, experimentally). From 
the point of view of such experiments, the variations are disturbing 
factors which the observer has to average out by taking large numbers 
of observations. If the variability is heterogeneous, however, it is 
often possible so to arrange the experiment that the group differences 
affect all the conditions or treatments equally, so that only the smaller 
residual variations contribute to the errors in the comparisons, giving 
an increased accuracy for the same number of observations. This is 
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Harris, in the paper from which the data for Table 6.i were 
taken, says that there were three series of ovaries : 

A, ovaries from opened flowers which were shaken from the shrub ; 

B, ovaries from opened flowers which were apparently continuing 

their development at the time of sampling; and 

C, ovaries which had completed their development to mature 

fruit. 

He wished to discover if series C was different from series B 
and (more particularly) from A, and compared among other things 
the mean numbers of ovules per ovary.* Now if each series had 
been taken at random from different lots of shrubs, the standard 
error of each mean would have been calculated from the variance 
between trees, and would have been i 0*511 4, as calculated in 
section lo.ii. Actually, however, each series was taken from the same 
shrubs, and by comparing corresponding means, the shrub variation 
was eliminated. For the purpose of such a comparison, the standard 
error of the grand mean is obtained from the smaller intra-shrub 
variance, and is that given in section 10.13 selection by strata, 
viz. 

= ± 0-055 29. 

or about one-ninth of the standard error for a random collection of 
ovaries from two series of independent trees. This increase of pre- 
cision may be expressed by the fact that to obtain the same accuracy, 
the number of ovaries from two independent series of shrubs would 
have to bear to the number for the restricted arrangement the ratio 

(O' 055 29)^ 

or about 86: i (assuming 100 ovaries to be taken from each shrub). 

A simple instance of the advantage of arranging the experiment 
or collection of data so that large variations are eliminated is that of 
comparing the means of two parallel series of observations, and as 
an example we will compare the mean head breadths of termites from 

* Actually, Harris compared the ovules per loculus, and we have taken the 
ovary as the unit because its distribution showed the heterogeneity of the 
variance better. However, Harris only dealt with ovaries with three loculi, 
so that it does not affect our argument. 
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nests 668 and 670 (Table 6.8, p. 137). First, ignoring the fact that 
corresponding pairs of breadths are from the same month by treating 
the 10 observations as though they were independent, we find 
according to section 5.3, 




(^1 ^2) 



0*191 6 

0*075 9 X 0*633 


4*00, 


Now, taking the five differences between corresponding pairs from 
the same month, we use the method of section 5.3 and find 


dV N ^ 0*191 6 X 2*336 
s o*o8i 09 


5*38. 


This increased value of t means that by comparing samples taken in 
parallel months, the significance has been increased. 

This fact may be expressed algebraically for the simple case; if 
there are two series of quantities, x[ and both measured directly 
as deviations from their grand means, the sum of the squares of their 
differences is 

S(x'^ — x'^)^ = Sx'^ + Sx'^ — 2Sx[x'^. 


When we wrote a similar equation in section 6.1, we assumed the 
variates to be independent and the product term to be zero, but if 
the variates are not independent, we see from equation (7.4), p. 165, 
that 

Sx'^x^ = T's/ Sx'^Sx'^. 


Substituting in the above equation, dividing by (AT — i) and writing 
or with an appropriate suffix for the standard deviation, we find 

From this, putting r — owe obtain the formula for the standard error 
of a difference between two independent means; but in pairs of 
parallel series like the head breadths of termites, r is positive and 
real and the standard error of the difference is thus reduced. If on 
the other hand r were negative, the standard error of the difference 
would be increased. 
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The Further Ajs^alysis of Variance 
10 *S. The whole question of the arrangement and interpretation of 
experiments is closely bound up with the analysis of variance, as 
we have shown in section 6.6 Usually there are more than two 
superimposed factors and the analysis may be complex. For example, 
when considering the variations in head breadths of termites of 
Table 6.8, we did two simple analyses into variations between and 
within nests, and between and within months. We see, however, 
that the nest and month variations are superimposed, and one 
analysis should separate the variations due to nests, months and 
residual random causes. Such an analysis will be developed in this 
section. 

Since there is a real variation between nests, and since each month 
is represented equally in each nest, we may eliminate the nest 
variations by expressing the breadths as deviations from their nest 
means as in Table lo.i. We may now analyse the variance of these 
deviations into two parts, between and within months ; this is done 
in Table lo.z. The ‘'total” sum of squares is, of course, the “within 
a nest” sum of Table 6.9, and arises from 20 degrees of freedom. 
The “between months” sum is the same as before (the elimination 
of the nest means has made no difference to the differences betv^-een 
the months), and the degrees of freedom are still 4; the final residual, 
with 16 degrees of freedom, is thus very small. We now test for 
association with the months by jSnding the z for the variances; it is 
I • 16, and for % = 4, == 16 lies well beyond the o-oi point. Thus, 

by taking account of and eliminating the nest variations, wt have 
established the month of year effect which previously appeared to 
be of doubtful significance. 

There is no reason why we should not start with the deviations 
from the monthly means, and analyse their variance into two parts, 
between and within nests. Such a process would give the same 
betweenmest variance as Table 6.9, and the same residual within 
a nest as is given in Table 10.2 for within a month. This residual is 
common, and represents random deviations that cannot be eliminated. 
The significance of the nest variance then becomes very much greater 
even than before, for 

^ i [log, 0-034 36 — log. 0-002 31] = I -35, 

and % = 4 and % = 16, We have thus analysed the total variations 
into three parts, between nests, between months and a residual; 



Mean Head Breadths of Termit3es (Small Soldiers) 



Between months .. .. 0*094046 4 0*03351 

Within a month .. .. a*036 92i 16 0*00331 





PRINCIPLES OF EXPERIMENTAL ARRANGEMENT zis 

each of these can be given a place in a larger table of analysis with 
four rows. 

The algebraic relations may be expressed by the equation, 

(^ — ^) == “f . . (104) 


where is an individual reading for the sth row and the fth 
column,^ 

Xs is the mean for the rth row^, 
is the mean for the sfth column, 

X is the grand mean, and 
d = X X is the residual deviation. 

Then squaring, and summing for all rows ( 5 ) and columns ( 5 ), we 
have 

SS{x - xf = SS{x, - xf + SS{x, ~ xf + SSd^, 

since the sums of the product terms come to zero. These are the 
sums of squares that go into the analysis table. If there are tit rows 
and fis columns (i.e. ns observations in each row and Uf in each column, 
total N = UsUi)^ the above equation becomes 

SS{x — = n^S{Xs — x)^ + niS{Xi — xf + SS(P . (10.5) 


Expressed^ in words, this equation states that the sum of squares 
of deviations of individual observations from the grand mean equals 
the sum of squares of deviations of row means multiplied by the 
number of readings in each row, plus the sum of squares of deviations 
of column means multiplied by the number of readings in each 
column, plus the sum of squares of residual deviations. 

These are entered in Table 10.3, and symbols are written for the 
variances. If af and are the squares of standard deviations 
of row, colunrn and residual variations in the infinite population, 
we obtain as in equations (6.3) 


Vs + 

v^ — > npf + O', 
% > 0 -,^ 


, . (10.6) 


After the significances of the sources of variation have been estab- 
ished by the z test carried out on the values of 0, it is advisable 

* The conception is generalised by writing row for month and column 
for nest. 



2i6 


METHODS OF STATISTICS 


to calculate these estimates of the ‘^corrected variances’’ as we 
may call them, as measures of the relative importance of the different 
sources. The values of v depend on the number of observations 
per row or column, i.e. on arbitrary features of the sample and so 
are of no value in themselves for measuring the objective properties 
of the population. If a large number of observations were taken 
individually at random from a population comprised of many rows 
and columns, of which the few rows and columns treated in the 
analysis are a random sample, the variance between these individuals 
w^ould tend to equal the value + a/ + If any factor were 
eliminated as a source of variation, this variance would be reduced 


TABLE 10.3 
Analysis of Variance 


Source of Variations 

Sum of Squares 

Degrees of Freedom 

Variance 

Rows 

S 

fit — 1 


Columns . . 

niS(xt — ~xy 

fls — 1 

Vi 

Residual . . 

SSd^ 

s t 

N — fis — fit + I 

Vf 

Total . . 

ss(x - ~xy 

' N- I 

— 

-- . 






by the amount of the corresponding variance. For the head breadth 
of termites, 




0-002 31 (residual), 


„ 0-02351 — 0-00231 

0’s = 0-004 24 (months), 


0-03436 — 0-00231 

Ot^ -> = 0-006 41 (nests); 


and the variations due to months and nests are seen to be com- 
paratively important. Sometimes, however, when there are many 
rows or columns, an imimportant variation may be highly significant. 
To interpret these variances in terms of normal frequencies, reference 
may be made to section 2.52. 

The estimates of subject to errors of random 

sampling, but since these corrected variances are differences between 
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two independent estimates of variance, their sampling distributions 
are composite. We shall make no attempt to determine them here, 
since all the tests of significance that are usually necessary can be 
performed on the values of v. 

In computing the residual sum of squares for Table 10.3, it is 
troublesome to find the actual residual deviations, d, as we have 
done above, and provided the arithmetic is carefully checkedf it is 
sufficient to find the sums of squares for the rows, columns and 
total, and to obtain the residual by subtraction. The procedure is 
facilitated by extending the methods of section 6.3, performing the 
following operations: 

(1) Sum the squares of the individual observations, 

(2) Sum the squares of the row totals and divide the sum by the 

number of individuals in each row, 

(3) Sum the squares of the column totals and divide the sum by 

the number of individuals in each column, 

(4) Square the grand total and divide by the grand total number 

of observations. 

Then it may easily be shown that the sums of squares required for 
Table 10.3 are: 

For rows • . the result of operation (2) minus that of (4), 

For columns . . (3) — (4), 

For the total . . (i) — (4). 

The observations may all be measured from some arbitrary origin, 
e.g. 2*200 mm. for the head breadths of termites, so that the quan- 
tities to be squared have only three or four significant figures, and 
the computations may all be performed easily with the aid only of 
a table of squares. 

10 . 31 . A statistical study of the variations in a population is useful 
for establishing a sampling scheme, as suggested in section ioa3, 
and in industrial practice for suggesting in what directions it would 
be profitable to attempt td control quality. It is usually possible to 
collect a sample in which the sources of variation are represented 
in some balanced way as in Table 6.8 and to carry out an analysis 
of variance. This represents the type of control by selection and 
arrangement referred to in the Introduction to this book. Analyses 
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carried out in this connection may be very complicated, and it will 
be instructive to deal with the following example in detail. 

The data in Table 104 are from a paper by Gould and Hampton 
(1936).’^ From a single “pot/’ about eighteen cylinders of spectacle 
glass are made on any one day or “journey.” Two pots are heated 
together in the same furnace and glass may be made on several 
consecutive days from the same pair of pots, forming a “run.” The 
quality measured, or variate, is the mean number of seed per unit 
area of glass, and this is given in Table 10.4 for three cylinders 
from each pot — the third, tenth and sixteenth in order of manu- 
facture — and for the first five journeys in each of four runs. There 
is no significance in the numbering of the pots, and the runs are 
independent, being separated by several weeks and sometimes 
referring to different furnaces. The separate figures are given in 
the table, and various totalsf have also been computed. 

Now each kind of manufacturing unit — cylinder, pot, journey or 
run — is a potential source of variation, since the causes that control 
quality may be consistently different for each unit. Further, since 
the journeys are consecutive days and the cylinders are in order of 
manufacture, there may be trends in quality. Clearly, the variations 
in density of seed shown in Table 10.4 are the result of the super- 
imposition of variations from many possible sources. Precisely what 
are these sources ? Which have statistically significant effects ? What 
is their relative importance? To answer these questions we shall 
analyse the variations in Table 10.4 completely. 

It would be possible to do this analysis in one step writing down 
equations analogous to equations (10.4) and (10.5), but the notation 
would be complicated and the process would be very difficult to 
follow. We shall break: Table 10.4 up into a number of simpler 
tables of kinds that have already been dealt with, and analyse these 
step by step. Readers are warned that they will find this analysis 
very difficult to follow xmless they have “at their finger-tips” the 
simpler analyses dealt with so far. 

First let us consider the fifteen readings in one pot-run, say the 
first pot and first run. This is statistically similar to Table 6.8 

^ From Table IV of their paper. We give here only four of the five runs 
given there, so as to make the analysis easier to follow. It might lead to 
some ambiguity in the argument if five runs and five journeys were retained. 

t These totals are not total frequencies as given, say, in Table 6.1. They 
are the sums of the qualities of the individuals included in the totals. 
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TABLE 10.4 

Mean Number of Seed per Unit Area 
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TABLE lo.s 
Analysis of Variance 


Source of Variation 

Sum of Squares 

Degrees of 
Freedom 

Variance 

{d) Within Pots 
(i) Between cylinders . , 

22 132-53 

16 

I 383*28** 

( 2 ) Between journeys . . 

37 916*66 

32 

I 184*90** 

(3) Residual within pots 

18 398-14 

64 

287-47 

(4) Total 

78447-33 

II2 

— 

(6) Between Pots 




(5) Between runs 

(6) Residual between 

13 679-90 

3 

4 559-96 

pots 

10099-57 

4 

2 524*89** 

(7) Total 

23 779-47 

7 

3 397-07** 

{c) Between Cylinders 




(8) Common to all runs 

(9) Common to both 

9 132*88 

2 

4566-44 

pots in run less (8) 

II 532*72 

6 

I 922 * 12 


20 665 - 60 

8 i 

2 583-20** 

(10) Specific to pot 

1 466-93 

8 

183-37 

(ii) Total 

22 13^*53 

16 

— 

(d) Between Journeys 




(iz) Common to all runs 
(13) Common to both 

9 684*00 

4 

2 421 -00 

pots in run, less 
(12) 

18 650*07 

12 

1 554-17 


28 334-07 

16 

1770-88** 

(14) Specific to pot 

9 582-59 

16 

598-91* 

(15) Total 

37 916*66 

33 

— 
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(p. 137), and the variation may be divided into three parts: between 
cylinders with two degrees of freedom, between journeys with four 
degrees, residual with eight degrees, and there are fourteen degrees 
for the total. This process may be carried out for the eight pot- 
runs, and the sums of squares and degrees of freedom added, giving 
the data in the upper part of Table 10.5. , 

Now consider the pot means, There are 

eight readings giving seven degrees of freedom and the sums of 
squares may be divided into two parts: between runs with three 
degrees and a residual within runs with four degrees. To obtain 
sums of squares comparable with those within pots, we should 


TABLE 10.41 



Cylinder 


3 

10 

1 Id 

Pot I 

49-6 

61 *8 

98*0 

Pot 3 . . . . i 

39-6 

61*0 

lOI *0 


multiply the sums of squares obtained from the pot means by the 
number of readings in a pot, 15, just as in equation (10.5) w'e 
multiplied the squares of the row and column deviations by the 
number in a row, n^, or column, Kj. This is equivalent to stating 
that the squared deviations should be summed over all the original 
individuals. When this process is carried out, the results in section {b) 
of Table 10.5 are obtained. 

The cylinder variations given in row (i) may be analysed further. 
The cylinder means for run i are in Table 10.41. This is like 
Table 6.8, and the variance may be analysed as in Table 10.3 into 
parts associated with pots (one degree), a cylinder variation conunon 
to both pots (two degrees) and a residual (two degrees) which 
represents an extra cylinder effect that is specific to each pot and 
is due to the deviations of the separate cylinder readings for each 
pot from the cylinder readings common to both pots. Similar 
analyses may be carried out for the four runs, and the sums of 
squares and degrees of freedom added. When this is done and the 
resulting sums of squares are multiplied by five, to make them 
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comparable with section (a) of Table 10.5, the results are as shown 
in rows (6), (8) and (9) together, and (10) of that table. It would 
be just as reasonable, statistically, to describe the residual in 
row (10) as an ext.ra pot effect that is specific to each cylinder, and 
to show impartiality, we may call the source of variation an inter- 
action between cylinders and pots. From a practical point of view, 
however, it seems more appropriate in this instance to concentrate 
attention on the cylinder variations. 

Yet another subdivision of the cylinder eiEFect common to both 
pots in a run (rows 8 and 9) is possible. The cylinder means common 
to both pots in each of the four runs are in Table 10.42. The total 
sum of squares between these means has ii degrees of freedom 


TABLE 10.42 



Cylinder 

3 

10 

16 

Run I . . 

44-6 

61 *4 

99*5 

Run 2 . . 

59*2 

73*5 

56-3 

Run 3 . . 

33-8 

45*3 

47*7 

Run 4 . . 

35-7 

55*6 

52-0 


and may be divided into parts associated with runs (3 degrees), 
cylinders common to all runs (2 degrees) and a residual (6 degrees) 
representing a differential cylinder trend for each run but common 
to the two pots in the run, i.e. a cylinder-run interaction. The sums^ 
of squares of these items multiplied by 10 and the degrees of freedom 
are entered in rows (5), (8) and (9) of Table 10.5. 

In a precisely similar way the journey-variations of row (2), 
Table 10.5, may be further analysed into the parts shown in 
section (d) of that table. 

Before testing the variances in Table 10,5 for significance, it will 
be well to determine of what corrected variances they are estimates. 
Let the variances entered in the table be denoted by v with a 
subscript corresponding to the row of the table, i.e, etc., 

and let the corresponding corrected variances be denoted by with 
a corresponding subscript. Thus, is the variance that would be 
found between the cylinder means within each pot if an infinite 
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number of observations could be made on each cylinder and there 
were an infinite number of pots. Further, let the number of the 
original individuals contributing to the mean of each individual 
factor be n with an appropriate subscript. For example, n-^ is the 
number of readings per cylinder within each pot, — 5, Mg is the 
number of readings per cylinder mean for each run, = 10,, and 
so on. Then the values of the m’s and the relations between the 
observed and corrected variances, expressed in the manner of 
equations (10.6), are as follows: 


Kl = 5 

IMl 

K2 = 3 

Oo + cr^ 

M3= I 


11 

0 

®5 + ®6 

^ 6 = 15 


nQ = 40 

©8 + ©9 

= 10 


0 

II 

s 

II 

®io ”io°'io^ + °'3" 

— 24 

®12 Vli + 

«13 = 6 

©13->Mi3ai32 + ©14 

11 

11 

©14 -» n^Oi^ + ag" 


We shall now proceed to discuss the derivation of these expressions. 

The relations between ©i, and are the same as those between 
the variances of Table 10.3, except that we have combined the 
estimates of several tables. It will be noted that the cylinder and 
joximey effects may vary from one pot or run to another and and 
of may themselves be complex. The residual variance cr| is due 
partly to the fact that only a limited number of sections on each 
cylinder were examined for seed and partly to a real variation that 
could not be associated with any factor. 

The relation between % and Cg follows from equation (10.6), and 
from the fact that after performing the analysis on the pot means, 
we multiplied the sums of squares by the number of individuals 
per pot. Thus, when the analysis is performed on the pot means, 
the Mj of equation (10.6) is the number of pots per lun = 2 ; but 
we multiplied the sums of squares by the number of readings per 
pot, = 15, so in the above equation, af is multiplied by the product 
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of these which is The residual variance measured between the 
pot means is the sum of two parts; it is the corrected variance 
between pots, a|, plus the square of the standard error or variance 
due to the fact that the residual variations within a pot are only 
sampled by = 15 individuals; this added variance is 01/15. The 
variance is times this measured variance between pot means. 
The cylinder and journey effects do not contribute to the error in 
a pot mean, since they are expressed as deviations from the pot 
mean. 

The relations between and ^9, between and between 

and and between and may easily be deduced in a 

similar manner; these variances have all been found from analyses 
like that in Table 10.3. The apparent variance is contributed 
to only by the part of the cylinder trend that is specific to the pot, 
and by the first residual, a|, and the relation given above is 
easily deduced. Similar remarks apply to the journey effect. 

We are now in a position to test the variances given in Table 10.5 
for statistical significance, using Fisher’s 2’-test. If there is no 
cylinder effect, af = o and % = ; the test for this effect consists 

in establishing the statistical significance of the difference between 
and v^. Similarly, V2 should be compared with with v^y 

Vq with e?3, with and so on. When these tests are carried out, 
the values of z corresponding to the variances marked with two 
asterisks are above the i per cent, point and that with one asterisk 
is between the 5 and i per cent, points; we shall presume that 
these show the existence of significant effects all the other values 
of z are below the 5 per cent, point. 

The variances and are both significantly greater than ^^3; 
if this had not been so, we should still have been justified in per- 
forming the further analyses of sections (c) and (d) of Table 10.5. 

The variance ^^5 is not significantly greater than Vq, so it is 
reasonable to combine these to give an improved estimate of the 
residual variance (b) between pots, based on 7 degrees of freedom; 
this is in row (7) of Table 10,5. Similarly, is not significantly 
greater than Vq and it is reasonable to work on the conclusion that 
the cylinder effect varies from run to run and is measured by the 
combined variance based on 8 degrees. This combined estimate 
of ^9 is significantly greater than We see, however, that is 

^ The one variance with one asterisk has a value of z very little below 
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not greater than ©3, and conclude that the cylinder effect does not 
vary from pot to pot within a run. Similar conclusions may be 
reached regarding the journey variances, except that is greater 
than ®3. We are justified in combining rows (10) and (3) to obtain 
an improved estimate of based on 72 degrees of freedom; it 
is 275-90. 

The significant effects and their corrected variances estimated 
from the apparent variances of Table 10.5 may be summarised as 
follows : 


Differences between pots, 

. a 524-89 -275-90 

Cylinder effect varying from run to run. 


149-93 


2583-20 — 275-90 
^2 ^ 230-73 


Journey effect, part due to variation from run to run, 
I 770-88 — 598-91 




6 


= 195-33 


Journey trend, part due to variation from pot to pot within a run, 
, 598-91-275-90 


'u 


3 


107-67 


Residual (unaccounted variation), 

or|-^ 275-90. 

These estimates of corrected variance measure the relative impor- 
tance of the several factors in the way described in section 10.3. 
The estimates are valid, however, only in so far as the third, tenth 
and sixteeenth cylinders are representative of the eighteen or so 
that may be made from a pot, and the first five journeys are repre- 
sentative of all the journeys possible in a run, in so far as there is 
any secular variation. For example, all the cylinders between the 
third and tenth could have fewer and those between the tenth and 
sixteenth could have more seed than the three measured, and this 
would be a source of variation entirely overlooked by our analysis, 
of which random errors take no account. We shall assume the repre- 
sentativeness of the cylinders and journeys. 

p 
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All the above values of the true variances are estimates subject 
to random errors, since the cylinder and journey effects vary from 
run to run or from pot to pot, and the few that have provided the 
estimates are a random sample from an infinite population of trends 
obtained from an infinite population of runs. Had there been a 
significant cylinder effect common to all runs, it would not have 
been subject to random errors in quite the same way. However 
many runs there had been, there would have been but two degrees 
of freedom for Vg ; the effect of a large number of pots is merely to 
increase in the equation 

^8 ^ 8^8 + ^9 

and reduce the error thus reducing the effect of random errors 
in Vg in estimating (t|. Similar remarks apply to the journey effect. 

We have developed the foregoing analysis by treating means — ^pot 
means, cylinder means, and so on. The computation is much easier, 
however, if corresponding totak are dealt with in the way outlined 
at the end of section 10,3. The general rule is first to square and 
add the various totals and divide by the number of individuals 
fornaing each total, producing a series of ‘'adjusted sums/' Then 
the sum of squares entered in Table 10.5 for any factor measured 
as a series of deviations from the means corresponding to a second 
factor is the “adjusted sum" for the first factor minus that for the 
second. As an example, let us find the sum of squares entbred in 
Table 10.5, below rows (8) and (9). The first factor is represented 
by the run-cylinder totals and the second by the run totals (i.e. 
in the analysis, rxm-cylinder means were measured as deviations 
from run means for all cylinders). There are 10 observations per 
run-cylinder total and 30 per run total; the adjusted sums are: 

(4462 + 6142 + 9952 + 59:22 4, _ ^ 10 = 401 305*70 

and (2 0552 -f I 88o2 + . . . ) ~ 30 = 380 540- 10, 

and the sum of squares is the difference between these, viz.: 
20 665*60. 


Arrangements for Agricultural and i^NTAtoGOus Experiments 
10 e 4 . In no subject has the application of the principles we have 
just introduced for reducing the errors of comparisons of treatments 
cr. /-ATnnIpte as in that of agricultural experiment* 



PRINCIPLES OF EXPERIMENTAL ARRANGEMENT 227 

The actual statistical technique is capable of application to other 
subjects, and it illustrates well the analysis of complex variations; 
for these reasons as well as because of the practical importance 
of the question itself, we shall deal explicitly with the arrangement 
of plots for field trials. There is an example of an application to 
another subject in section 10.5. 

It is well known that there are regional changes in natural fertility 
over areas of land, and that if varieties of com (for example) are 
compared by sowing one large plot with each and measuring the 
yields, there is a possibility that any measured differences ma y be 
due to variations in soil fertility rather than to varieties of corn. 


TABLE 10.6 

Yields of Grain in Grammes 
(Plots 7^ X 2I ft.) 



1 

2 

3 


5 

I 

360 

324 

306 

274 

277 

2 

287 

324 

282 

252 

277 

3 

274 

302 

247 

234 

256 

4 

316 

304 

232 

261 

271 

5 

354 

364 

299 

302 

301 

6 

295 

294 

298 

313 

268 

7 

197 

201 

236 

209 

217 

8 

335 

320 

259 

205 

207 

9 

267 

262 

279 

332 

268 

10 

291 

300 

350 

] 

383 

264 


For that reason the necessity of repeating the comparisons over 
several plots so that the effect of soil variations can be assessed is 
well recognised. It is true, also, that these variations often exhibit 
trends so that if small plots are taken the average fertility of neigh- 
bouring ones tends to be alike; the art of planning an experiment 
lies in arranging that the varieties or conditions xmder investigation 
(we shall call them treatments) occur in neighbouring plots, so that 
the larger regional changes in fertility are eliminated from the 
comparison. It is desirable also that the standard error of the differ- 
ences due to the uneliminated variations in fertility may be estimated 
so that the significance of any difference can be tested. 

For the purposes of studying soil variations, many uniformity 
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trials have been conducted, and the data of Table 10.6, taken from a 
paper by Christidis (1931), are the results of one conducted at 
Cambridge. They are the yields of corn from fifty adjacent plots of 
7I X 3| feet treated uniformly, and harvested separately. If we 
wished to compare five treatments (say), we could distribute them 
at random over the fifty plots. In such case the st andard error of 
the mean for any one treatment would be V vJiOy where is the 
residual variance between plots after treatments have been elimi- 
nated, and that of the difference between two means would be 
VS times this; here the treatments are the same, and the residual 
variance for such an arrangement is the mean variance for the fifty 
yields and equals 95 940/49 = i 958. 

10 . 41 , We might expect to eliminate the variations between rows, 
however, by ensuring that every treatment occurs once in each row, 
so that only the residual within-row variance contributes to the errors 
of the differences. Labelling the treatments A, B, C, D and E, such 
an arrangement as the following could be tried, 

ABODE 

ABODE 

We have analysed the variance of the yields of Table 10.6 into 
three portions as in Table 10.3 — ^variations between rows, between 
treatments arranged as shown and a residual within rows; the results 
are in Table 10.7. There are ten rows giving nine degrees of freedom, 
five treatments giving four degrees, fifty plots giving forty-nine 
degrees for the total and leaving thirty-six jfor the residual. We know, 
however, that the five treatments are all the same, and so may combine 
the treatment and residual sums of squares to give a within-row 
variance; we see that by eliminating the row variations, the 
variance is reduced from i 958 to i 218; i.e. a comparison based on 
about iz replicates properly arranged with one of each treatment 
per row is as good as one based on about 20 perfectly random 
replicates. In practice, if the treatments differ, their significance 
is tested on the basis of the residual variance of i 063 ; in this example 
the treatments variance (2 616) paradoxically seems greater than the 
residual. For the difference z = 0*45, and being only just below the 
5 per cent, point, is suggestive (although not strongly) of a real 
Thifi is because we have neglected a fundamental principle 
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— ^the plots are not randomly distributed within the restrictions 
imposed, and the treatment differences coincide with column differ- 
ences. The arrangement would have been sound if we had settled 
the order of the plots in each row separately by chance (say by draw- 
ing lettered tickets from a hat), and the residual variance (i 063 in 
Table 10.7) would not have been very different from the variance 


TABLE 10.7 

Analysis of Variance of Crop Yields 


Source of Variations 

Sums of Squares 

Degrees of 
Freedom 

Variance 

Between rows 

Between treatments . . 

Residual 

47 201 *6 

10 462*6] 

i}- 

m 

Total . . 

95 940*0 

49 

1 958 


within rows (i 218); the reader is recommended to try it out on 
the data. 

Other systematic arrangements have been devised to overcome 
such obvious defects as shown by the one just condemned, and of 
them the ‘"chessboard” had been very popular. This places every 
treatment as close to every other as possible, and an arrangement 
for seven treatments in rows of five is here shown. 

A B C D E 

F G A B C 

D E F G A 

B C D E F 

It will be seen that the treatments may be assigned in the same 
order, row after row, and yet they are all represented in every column. 
If there are as many treatments as plots per row, each one may be 
two places to the right of its position in the previous row, viz. 

A B C D E 

D E A B C 

B C D E A 

E A B C D 

C D E A B . 
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The usual method of treating the results is to divide them into groups 
containing all the treatments in the order in which they occur; in 
the seven treatments example above, the first group would contain 
the first row and the first two plots of the second, the second would 
contain the last three plots in the second row and the first four in 
the third, and so on; for five treatments, the groups would be rows. 
Then the total variance is analysed into three parts — between groups, 
between treatments and a residual from which the standard error of 
a difference may be calculated. The danger of colunrn variations 
giving spurious effects is eliminated, but such arrangements do 
produce regular, if more complex patterns, and there is a remote 
chance of the soil varying periodically and more or less in step with 
the pattern. Further, the arrangement does not make the best pos- 
sible comparison, for, taking the set of five treatments, the groups 
are rows, and the residual variance (which for Table 10.7 is i 218) 
includes column differences, although they do not enter into the 
comparisons. The estimates of errors in such systematic arrangements 
are thus not valid. 

Restricted Random Arrangements of Plots 
Random Groups 

10 . 42 . For large numbers of treatments it is better to have groups of 
as many plots as there are treatments, arranged in clumps rather 
than in rows, and to assign the position of each treatment within 
the group by chance. The random arrangement of five treatments in 
the rows of Table 10.6 is a special case of this, and the analysis of 
the yields is the same as in Table 10.7.* 

The Latin Square 

The application of the Latin square arrangement to field experi- 
ments is due to Fisher (i936).t If there are n treatments to be 
compared, a block of plots having n rows and n colunms is marked 
out, and the treatments are distributed at random, with the restric-. 
tion that each one must occur once in each row and once in each 
column. The field of Table 10.6 has 5X10 plots, and so is suitable 

^ The results may be written in a table of rows and columns, each row 
referring to a group and each column to a treatment; then the terms of 
Table 10.3 apply directly. 

+ The jfirst edition of this book appeared in 1925. 
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for the formation of two Latin squares — -we give an example below. 
The variance of such an arrangement may be fairly simply analysed 
into five parts: (i) between 2 blocks (1 degree of freedom), 
(2) between 10 rows (the deviations are measured from 2 block 
means, giving 8 degrees of freedom), (3) between columns (we regard 
the 10 columns of the 2 blocks as contributing separately to the 
variance, giving 8 degrees of freedom), (4) between treatments 
(4 degrees of freedom) and (5) a residual with 28 degrees of freedom, 
giving a total of 49 degrees. Equation (10.4) may be extended to 



I 

3 

3 ‘ 

4 

5 

I 

B 

E 

A 

D 

C 

3 

D 

A 

E 

C 

B 

3 

E 

D 

C 

B 

A 

4 

A 

C 

B 

E 

D 

5 

C 

B 

D 

A 

E 

6 

D 

B 

C 

A 

E 

7 

A 

C 

E 

D 

B 

8 

B 

A 

D 

E 

C 

9 

E 

D 

B 

C 

A 

10 

C 

E 

A 

B 

D 


give equation (10.7), where the same notation is used, the suffixes r 
referring to blocks, s to rows, t to columns and u to treatments. 

{x — :«) =■ {x, — X,) + (^ — X,) + (»„ — x) + d. (10.7) 

The rows and columns are measured as deviations from their block 
means, the blocks and treatments from the grand mean. When this 
is squared and summed, because of the restriction that each treatment 
is represented equally in each row and so on, the product terms 
become zero, and we have equation (10.8), which is parallel to (lo.'s) 
and naay be entered in a table of analysis of variance. 

S{x — xY — n^S{x, — xf nS{Xs — w,)* 

4 - nS(xi — XfY + 2nS(x„ — + S(P, . (10.8) 

where n is the number of rows per block. 

These terms are the sums of squares of deviations of the various 
means from the block or grand mean, multiplied by the number of 
observations contributing to those various means. The actual arith- 
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metic is far less complicated than the algebra, and if the reader will 
work through an example, he will soon gain confidence. 

When there is an awkward number of treatments (like 6), so that 
the means are not correct when calculated to one or two decimal 
places, the following arithmetical scheme, extended from that in 
section 10.3, is convenient: 

(1) Square every plot yield and sum, 

(2) Find the total yields for the blocks, square and sum them 

and divide by the number of plots per block, 

(3) Find the total yields for the rows, square and sum them and 

divide by the number of plots per row, 

(4) Repeat this for columns, 

(5) Repeat this for treatments, and 

(6) Square the grand total yield and divide by the total number 

of plots. 

Then the terms in the following are equivalent to the corresponding 
ones in equation (ro.8), 

[(I)- (6)] = [(2) -(6)] + [(3) -(2)] 

+ [( 4 )-( 2 )] + [(s)~(6)] + 5i^ 

The data of Table 10.6, with the above arrangement of imaginary 
treatments, have been reduced in this way, and the analysis is in 
Table 10.71. The treatment variance, although greater than the 
final residual, is not significantly so (2’ ==0-38 while 2: = 0*50 
lies on the 5 per cent, point), and we will conibin'e them and regard 
the mean variance 866 based on 32 degrees of freedom as the residual. 
The row and treatment variances are both greater than the residual 
(the z^B lie above the 5 per cent, point), and the arrangement has 
reduced the residual variance to 866 (i.e. 9 replicates so arranged 
are as good as 20 at random). 

The block variance may not be compared with the residual, 
because the rows and columns are not common to both blocks, and 
their variations contribute to the block differences. 

For comparing treatment means, the standard error of each mean 
is V 866/2W, where zn is the number of plots per treatment. In 
making such comparisons of several means,- due regard must be 
paid to the general considerations advanced in section 3.33. 

It may be mentioned in passing that Latin square arrangements 
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are useful in many fields of experiment and have also been used 
for separating sources of variation not subject to experimental 
control, as was done in section 10.31. The ‘^treatments,’' the rows 
and the columns may represent three superimposed factors, and 
there may be several squares, each with different individual repre- 
sentatives of the factors. Then the three corrected variances are of 
more interest than the means for the “treatments” and tUeir 
standard errors. 

Every separate experiment must be arranged in its own, inde- 
pendently and randomly chosen Latin square. To form a square, 


TABLE 10.71 

Analysis of Variance of Crop Yields 


Source of Variation 

Sums of Squares 

Degrees of 
Freedom 

Variance 

Blocks 

3 698*0 

I 

3 

698 

Rows 

43 503-6 

8 

5 

438 

Columns . . 

21 042*4 

8 

2 

630 

Treatments 

6452*61 

>27696*0 


I 

^^ 3)866 

Residual . . 

21 243 *41 

28] 


7591 

Total . . 

95 940-0 

49 

1958 


a number of cards may be lettered, one for each treatment, and 
shaken up in a hat. Then the letters may be assigned to the position 
in a row in the order .in which the corresponding cards are drawn. 
The whole procedure may be repeated for the second and subsequent 
rows. Proceeding in this way, however, it will usually be found that 
after three or four rows, one or two letters appear more than once in a 
column. Whenever this happens, the row should be re-drawn imtil 
the restriction of one treatment per column is satisfied. In six-by-six 
or larger squares, a good deal of adjustment may be necessary and 
this may allow some scope for the unconscious introduction of a 
slight degree of system in the arrangement, which may tend to 
repeat itself if one experimenter has to make many squares. After 
a Latin square has been designed, however, the rows and columns 
may be rearranged in some random order fixed by drawing numbered 
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cards from a hat, and if this is done it is hardly likely that any 
serious lack of randomness will survive in the scheme. However, 
workers who have to form many Latin squares are advised to use 
such methods as described by Yates (1933a). 

of Controls 

10 . 43 . One common method of comparing a number of experi- 
mental treatments when there are disturbing variations is to repeat 
one of the treatments, a '‘control,’’ with each of the others, under 
conditions as similar as possible. Then, the diiferences between the 
treatment and control values are measured, and these differences are 
used to compare the treatments. In order that the precision of such 
an experiment may be measured, the treatment-control pairs must 
be replicated, and the relative positions of the treatment and control 
within each pair must be fixed independently and at random. 

For example, let us suppose we wish to compare five treatments 
A B C D E on the plots referred to in Table 10.6, repeating A as 
a control with each of the others in a contiguous plot in the same 
row. Then we can only use four columns and the arrangement may 
be like the following: 


CoL I 

CoL 3 

Col 3 

C0Z.4 

B 

A 

A 

C 

A 

D 

E 

A 

A 

B 

C 

A 

D 

A 

E 

A 

etc. 

etc. 

etc. 

etc. 


The differences between the treatments and controls may be found ; 
if the variance within the treatments of these differences is v and 
the number of replicates of each treatment is n, the standard error 
of the mean difference between any treatment and the control is 
V vjn., while that of the mean difference between any two of the 
treatments B to E is zvln. 

Since the data of Table 10.6 are the result of a uniformity trial, 
V may be estimated as twice the variance within the pairs, and from 
the formula at the end of section 5.1 (p. in), we find that v = 737*6. 
The standard error of the difference between any two of the four 
treatment effects is therefore V i 475 *2/;^ and %n plots have been 
used. The standard error of the difference between treatment A 
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and the other four is 'sj 737*6/^. If the four treatments and control 
were distributed among %n plots in the random arrangements, there 
would be 83^/5 replicates and the standard error of the difference 
between any two treatments would be 

2 X1 958 X 5 
%n 

The method using a control is, in this instance, more efficient than 
the random method. On the other hand, similar calculations show 
the Latin square method to be more efficient than either except for 
comparing treatment A with B, C, D or E. 

The use of a control may provide very accurate comparisons if 
it is technically convenient to bring each treatment and its control 
into very close proximity. This is achieved in Beavan’s ''Half-drill 
strip’’ method in which adjacent long narrow strips are assigned to 
the control and treatment in pairs. Often, the arrangement is such 
as to sandwich two strips of treatments between two of controls. 
This practice is to be deprecated, however, as it violates the principle 
of randomness; within each pair of strips the relative positions of 
the treatment and control should be decided by chance. 

Multiple Factor Experiments 

10 . 5 « A multiple factor or factorial experiment is one in which 
several factors are varied together. For example, in an experiment 
arranged to investigate the effect of an ingredient of the size mixture 
used for protecting cotton warps in weaving, there were four treat- 
ments in which there were two forms of this ingredient, A and JS, 
and each was used in the size in two concentrations: i and 2 units. 
The treatments were : 

(i) I unit of ingredient A 

(ii) 2 units of ingredient A 

(iii) I unit of ingredient B 

(iv) 2 units of ingredient B. 

Further, the experiment was done twice on two separate qualities 
of yarn, X and 7 . ‘Thus there were three factors: form of ingredient, 
quantity of ingredient and quality of yarn. The quality measured 
on the warps was the rate at which the warp threads broke during 
weaving, the rate being expressed as the number per 10 000 picks.^ 

^ One ‘‘pick’* corresponds to the insertion of one weft thread. 





METHODS OF STATISTICS 


236 

Although the three factors were varied together in the same 
experiment, the arrangement is balanced so that both ingredients 
were present in both quantities and were used on both yarns. Con- 
sequently, the deviations of the two mean breakage rates for ingre- 
dients A and B from the grand mean are unaffected by anything 
but the forms of ingredient and errors. These mean deviations are 
saiS to measure the main effect of ingredients. Similarly, the main 
effects of the quantities of ingredient and of the yarn qualities may 
easily be measured, each being undisturbed by the other two factors. 

TABLE 10.8 

Warp Breaks per 10 000 Picks and Treatments 


Loom.. .. 

7 

8 

9 

10 

Period 


Yarn X 


I 

I -4 (iv) 

5-3 (i) 

7-4 (ii) 

3*2 (iii) 

Z 

Z'Z (iii) 

4-1 (ii) 

8-8 (i) 

I • 6 (iv) 

3 

z-l (ii) 

2*2 (iv) 

4-8 (iii) 

4-7 (i) 

4 

2-4(1) 

3-3 (iii) 

2*6 (iv) 

4-3 (ii) 

Loom . . . . j 

15 

16 

17 

18 

Period 


Yarn Y 


5 

3-8 (iii) 

i-S(i) 

1-9 (ii) 

2-0 (iv) 

6 

i-S (iv) 

1-3 (ii) 

2-0 (i) 

I 1-8 (iii) 

7 

2-S (ii) 

I -4 (iv) 

2-4 (iii) 

1 2'0 (i) 

8 

5-8 (i) 

1-9 (iii) 

1-7 (iv) 

3-2 (ii) 


It is possible, however, that the effect of varying the quantity of 
ingredient may be different for form A than that for form B, or 
alternatively, the difference in response of warp breakage rate to 
A and B may depend on the quantity of each present. If this is so, 
there is said to be an interaction between form and quantity of 
ingredient. There may be interactions between any pair of the three 
factors. If such interactions exist, it may also be that the effect of 
increasing the quantity, say, from i to 2 units may be different 
according to whether the ingredient is A on yarn AT, A on F, 
B on X or B on F. If such an effect exists, there is said to be a 
second'-order interaction between the three factors. 
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The main effects and interactions may be measured and tested 
by an analysis of variance on the results, and with this in view we 
shall deal with results of the above experiments in full. 

There were four warps* of each yam, one treatment being used 
for each warp. The warps of yam X were woven in four looms, 
and the total weaving time was divided into four periods. Between 
the periods, the warps were interchanged among the looms in such 


TABLE 10.9 

Anajlysis of Variance of Warp Breakage Rates 


Source of Variation 

Sums of 
Squares 

Degrees of 
Freedom 

Vanance 

(i) Yams 

17-701 

I 

17*701 

(2) Periods 

10-348 

6 

1-725 

(3) Looms 

36-763 

6 

6-127 

(4) Treatments 

28*978 

6 

4*830 

(5) Residual 

8-539 

12 

0-712 

(6) Total 

102 • 329 

31 

— 

Main Effects 




(7) Quantity 

5*120 

I 

5-120 

(8) Ingredient . , 

17*111 

I 

17-111 

Interactions 




(9) Quantity — Ingredient 

0*046 

1 

0*046 

(10) Quantity — ^Yam . . 

0*320 

I 

0-320 

(ii) Ingredient — ^Yarn . . 

6-302 

I 

6*302 

(12) Quantity — Ingredient — ^Yam 

0*079 

I 

0*079 


a way as to complete a Latin square arrangement with periods as 
rows and looms as columns. The warps of yarn Y were woven in 
separate looms and periods from X, in anoAer Latin square. The 
arrangement and warp breakage rates are given in Table 10.8. 

First let us analyse these data as we did the crop yields in the 
double Latin square of section 10,42, except that since we are con- 
sidering the possibility of yam-treatment interactions we will keep 
separate the four treatments in each block, giving eight duplicated 
treatments with six degrees of freedom. The analysis is in rows 
(i) to (6) of Table 10,9. Rows (2) to (5) would be obtained if the 
two Latin squares were analysed separately and the results com- 
^ A warp is a Unit quantity of yam sized uniformly. 
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bined. The variance due to yarns in Table 10.9 may not be compared 
with the residual term, since loom and period variations contribute 
to the yam differences. The variance due to treatments in row (4) 
is significantly greater than the residual, so we may analyse the 
treatment effects further. 

The yam effects may be eliminated by combining the results of 
the two squares. When this is done, the totals of the breakage rates 
for the four basic treatments are : 

Ingredient 




A 

B 

1 unit 

. . 

32*5 

12*4 

2 units 

. . 

.. 26*7 

14*4 


This is a 2 X 2 table of the kind analysed in section 10.3, and may 
be analysed into parts due to quantity, ingredient and the residual, 
each with one degree of freedom. Here the residual is the inter- 
action between quantity and ingredient. For inclusion in Table 10,9 
and comparison with the sums of squares given there, the sums of 
squares of this subsidiary analysis must be divided by 8, the number 
of readings per total. The same results would be obtained if the 
analysis were performed on the treatment means and the sums of 
squares were multiplied by eight. This is equivalent to performing 
the summations over all individual observations, and follows the 
lines of previous analyses. The results are in rows (7), (8) and (9) 
of Table 10.9. 

To obtain the quantity-yarn interactions, we may eliminate the 
ingredient effects and obtain the following totals : 

Yarn 





X 

Y 

I unit 

. . 

. . 

33*7 

20*2 

2 units 

. . 

, . 

25*7 

15*4 


The analysis of these totals with the sums of squares divided by 
eight gives variances due to: yarn [row (i) of Table 10.9], quantity 
[row (7)] and the quantity-yam interaction [row (10)]. 

Similarly, the ingredient-yam interaction entered in row (ii) of 
Table 10.9 may be obtained from the following totals: 

Ingredient 




A 

B 

Yam X 

, . 

39 'i 

20*3 

Yam Y 

. . 

. . - 20*1 

iS’S 
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The second-order interaction between the three factors is most 
easily obtained by subtracting the treatment effects so far determined 
from the total effect in row (4). This is given in row (12). 

The variances in rows (7) to (12) of Table 10.9 are due to the 
various treatment effects plus the experimental error, so that they 
may all be tested against the variance in row (5) for significance. 
The main effects of quantity and ingredient, and the interaction 
between ingredient and yam are significant, and we may express 
the final results by the following mean values of the breakage rates 
per 10 000 picks : 

1 unit .. .. .. 3-4, 2, units 2*6, 

Ingredient Ingredient - ^‘S, 

For comparing the quantities, since there are i6 readings per mean, 
the standard error of the difference is ± V2 X 0 ‘71^/16 = db 0*30, 
and the corresponding standard error of a difference between ingre- 
dients on one yarn is ± Va X 0*712/8 = i 0*42. The difference 
between ingredients on yarn Y is not significant. 

Thus, the final conclusions from these experiments are that the 
use of two units of either ingredient instead of one reduces the 
warp breakage rate slightly on both yarns, that on yarn X ingre- 
dient A gives a considerably lower breakage rate than B and that 
on yarn Y there is little or no difference in response to the two 
ingredients. It is interesting to note that the breakage rate is higher 
on yarn X than F, and although we have not tested to see whether 
this is due to the yarns or to loom and period variations, it may be 
that the higher breakage rate is necessary to give sensitivity to the 
form of ingredient. 

The computations for this example may be readily performed by 
the method of section 10.3, squaring and summing the various totals 
and dividing by the numbers of original observations in the totals. 
Before combining the results of the two Latin squares, it is well to 
perform the analyses separately to see if the two residual variances 
differ significantly. If they do, the tables should not be combined. 
We have ascertained that the two residuals are substantially the 
same, but have not given the results. 

When possible, it is advisable to have more variations of each 
factor than were used in this experiment. With so few degrees of 
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freedom only marked effects are significant. More factors may be 
superimposed and higher order interactions investigated. Fortu- 
nately interactions of a high order are seldom significant; if they 
were often of importance experimental investigations would be very 
cumbersome and generalisations would be difficult. 

Statistically, an interaction between factors appears as a residual 
in an analysis, and in the same way any residual may be regarded 
as an interaction. For example the residual term in Table 10.9 
arising from the first analysis of Table 10.8 is in fact due to experi- 
mental errors and interactions between treatments, looms and 
periods. If a trial were made with uniformly sized warps, the 
residual would be less than that of Table 10.9 by the variance due to 
these interactions. Where such interactions exist, the assumptions 
underlying the analysis of a Latin square are not satisfied and it 
is advisable to arrange the treatments in random groups. 

General Discussion 

10 . 6 . Technical literature, particularly that relating to agriculture, 
horticulture, forestry, animal breeding and other branches of applied 
biology, abounds in examples of the application of the foregoing 
arrangements and developments of them to practical experiments. 
A fairly detailed account of the subject treated from the points of 
view of the agriculturist is given by Sanders and Wishart (1935). 
Practically the whole of this experimental technique owes its inspira- 
tion to Professor R. A. Fisher, and readers are strongly recommended 
to refer to his book The Design of Experiments (1936&). There, 
Professor Fisher deals with the fundamental principles of experi- 
mental procedure and treats of various methods in detail. 

Of the possible methods of arrangement, the most suitable will 
depend on the particular field of inquiry, on technical considera- 
tions and on the nature and extent of the variations in the untreated 
subject of experiment. In this connection, statistical surveys or 
uniformity trials are very valuable. 

Results obtained from any particular experiment are subject to 
the obvious linaitation that they do not necessarily apply under 
conditions other than those obtaining for the trial. For example, in 
a cereal variety trial, variety A may be significantly better than 
variety 5 , but it does not follow that A will always be better than 
e.g. in different localities and weathers, nor even that A will be 
better than B on the average. If it is desired to know whether A 



PRINCIPLES OF EXPERIMENTAL ARRANGEMENT 241 

gives a better yield than B on the average for all years and for 
several localities, the trial must be repeated in those localities and 
for several years, and the mean difference betv^een the varieties be 
compared with a standard error obtained, not from a mean residual 
variance within the trials but from a residual variance between the 
trials. This point needs emphasis, because it may happen in prac|ice 
that a highly significant result based on a single trial has only a 
narrow range of application. 

This is one reason why multiple-factor experiments such as that 
exemplified in section 10.5 are advocated. The average effect of 
increasing the amount of the ingredients investigated there is 
established not only for one ingredient, but for two ingredients 
and yarns, and the applicability of the results is correspondingly 
widened. 

In all the arrangements we have considered in this chapter, the 
various factors have balanced so that as far as possible each indi- 
vidual of any one factor is associated once with each individual of 
another. This has made the analysis much easier than it would 
otherwise have been. Equations (10.5) and (10.8) were much sim- 
plified because certain product terms were zero, so that the deviations 
due to each factor could easily be separated and estimated inde- 
pendently. This property is termed orthogonality and is highly 
desirable. 

Sometimes, however, accidents occur and individual readings are 
lost. In many such instances the missing readings may be estimated 
by the method of least squares, so that the analysis of variance is 
still possible. In some experiments, too, a complete arrangement 
involving every combination of all factors may be unwieldy and 
unnecessary. Then certain combinations may be omitted, and the 
corresponding treatments are said to be confounded. For example, 
in the arrangement of Table 10.8, the yam effect is confounded with 
the loom and period effects, and there is no possibility of measuring 
the interactions. This aspect is dealt with by Fisher (1936^), and 
the treatment of a number of incomplete and confounded arrange- 
ments is worked out in a number of papers by Yates (1933 and 
1936). 

Whether the experimental arrangement is balanced or unbalanced, 
it should be carefully designed, otherwise serious inefficiency may 
result or two important effects may be confounded and so be 
unmeasurable. The time to consult statistical principles is before 

Q 
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the experiment is planned, not after the results are obtained and 

are in .confusion. ■ -r , 

It is well to emphasise again that all the tests of significance and 

estimates of corrected variance in this chapter are based on the 
assumption of a homogeneous residual variation; i.e. it is assumed 
thatwthe residual variance is, within the limits of random errors, the 
same for all blocks, rows, columns and factors that are combined. 



CHAPTER XI 


MULTIPLE AND PARTIAL CORRELATION AND 
REGRESSION 


Partial Correlation 

11 . 1 . We showed in the last chapter how the heterogeneity of 
variability has an important effect on the theory of errors, and in 
this chapter shall study its effect on correlation. 

Table ii.i shows the mean grain and straw yields for each of 

TABLE II.I 


Grain and Straw Yields 


Treatment 

Block 1 

Grain Straw 

Block 2 

Grain Straw 

Block Z 

Grain Straw 

Block 4 

Grain Straw 

A 

620 

242 

646 

321 

681 

261 

644 

317 

B 

644 

267 

745 

382 

542 

201 

71 1 

316 

C 

523 

215 

713 

330 

686 

298 

688 

381 

D 

601 

213 

693 

292 

68s 

265 

714 

255 

E 

664 

322 

693 

370 

666 

284 

516 

323 

F 

SI4 

200 

637 

261 

697 

259 

710 

361 

G 

550 

260 

708 

318 

663 

266 

673 

340 

H 

521 

ao3 

661 

275 

594 

207 

730 

331 


Block 5 

Block 6 

Block? 

Blocks 

A 

706 

255 

615 

33 X 

552 

216 

726 

29s 

B 

70s 

280 

637 

285 

543 

200 

646 

309 

C 

692 

300 

612 

294 

635 

256 

748 

284 

D 

699 

238 

697 

309 

701 

283 

746 

324 

E 

656 

232 

663 

393 

657 

351 

683 

363 

F 

633 

234 

595 

258 

697 

306 

712 

376 

G 

671 

362 

626 

400 

655 

276 

671 

38s 

H 

625 

229 

644 

266 

745 

276 

747 

328 


64 plots (the units do not concern us here), there being eight different 
manurial treatments and eight replicates^on each. The data are 
selected from a paper by Eden and Fisher (1927), and are merely 
used here for illustrative purposes; readers who are interested in the 
agricultural aspect must consult the original paper. Further, we 
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shall for the moment ignore the fact that the sample is small with 
comparatively large sampling errors, and shall treat the statistical 
constants as though they were almost exact. The treatments were 
distributed at random within blocks and we have in Table 11.2 
analysed the variance of grain and straw yields into three parts; the 
fifth, eighth and ninth columns should be ignored for the moment. 
The block variation is much greater than the residual for both 
grain and straw, and so is the treatment variation for straw, while 
the variance of grain yields is less for the treatments than for 
the residual (although not significantly so); thus the variations in 
the yields of Table ii.i are heterogeneous. Now in order to find the 
relation between grain and straw yields, we may correlate the sixty- 
four pairs of readings, and the crude correlation of the actual readings 
gives a coefficient of +0-524, indicating a positive relationship. 
This, however, is not the whole story; the variations are produced 
by three groups of causes, changes in soil fertility, changes in treat- 
ments and that complex of unknown causes which we call random, 
and it is unlikely that the relationship between grain and straw 
yields is the same for the three types of variation. Indeed, in 
a general way, plots that produce most grain might be expected 
to produce most straw; but the treatments which had an effect 
on the straw yield had none on the grain, and this relationship 
cannot hold for treatment variations. It is thus necessary to 
separate out the effects and to find several correlations, and this 
is done by a fairly straightforward extension of the analysis of 
variance. 

Let us suppose first that the block and residual variations are 
due to the same thing (soil variability), and that we wish to separate 
them from those due to manurial treatments, and to correlate them. 
This may be done quite straightforwardly by finding the 64 pairs 
of deviations from the treatment means and correlating them, by 
finding the sums of their products and squares. We may also correlate 
the eight pairs of treatment means (or totals). Using the same nota- 
tion as before, but calling the grain yield y and the straw yield x, 
the values of an individual pair of readings in the rth treatment as 
deviations from the grand means are 


(^ — ^) = (^ "“ ^s) + i^s — 
and 

(y _ j;) = (j _ y^) + (y^ _ y), 


(I1.I) 
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and their product summed for the sth treatment is 

S'{x — x){y — y) = S'(x — x,)(y — y,) + n^Xs — x){y, — y) 

+ - Xs) + (x, - x)S'{y ~ j;,), 

where is the number of plots per treatment. The last two terms 
are ^ero since S'(x — x^) and S'(y — fs) sums of deviations 

from the mean and are zero, so this equation, when summed further 
over all treatments, gives 

S(x — x)(y — y) = SS'{x — x,) (y — y^) 

+SnXx, — x)(y, — y). . . (ii.ii) 

The term on the left-hand side is the sum of products used in 
correlating the crude deviations from the grand mean, the first term 
on the right-hand side is used in correlating the deviations from the 
treatment means and the second term in correlating the treatment 
means themselves. This equation is exactly parallel to equation (6.6) 
on p. 132 for the analysis of variance, and may similarly be entered 
in a table of analysis of co-variance. The degrees of freedom are 
reckoned up in the same way, and the sum of products divided by the 
number of degrees of freedom gives the mean product or co-variance. 
The sums of squares and variances can be entered in the same 
table, and the co-variance of any cause of variation when divided 
by the square root of the product of the variances [equation (7.4), 
p. 165] gives the corresponding correlation coefficient. 

We can deal with the co-variance in exactly the same way as 
the v^iance, using all the arguments and equations previously used, 
except that co-variance may sometimes be negative while variance 
must always be positive. We saw in the last chapter that variance 
may be analysed into more than two parts, and now will use the 
same methods to analyse the co-variance of the yields of Table ii.i 
into three parts. For a single plot in the ^th treatment and tth block, 

(x — x) = (Xg — -)- (xt — ^) + X'y 

(y~y) = {y,-ji) + (yt-y)+y', 

where x' and y' are residual deviations; and by a similar argument 
to that used above, summing over all treatments and blocks, 

S{x —x){y—y)= Sn,{x, — *) (j, — y) 

+ Snt{xt — 3 ^{yt—y) + Sx'y', . . (11,2) 
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where is the number of plots in a treatment and % the number 
in a block. This is exactly parallel toequation (10.5), p. 2i5,*and the 
terms have been entered in Table 11.2 under the sums of products 
column, and when divided by the degrees of freedom, as co- 
variances. For convenience of computing, equation (7.61), p, 166, 
may be applied to find the terms of the above; i.e. 

S{x — (3; — j;) = Socy — Nxy, 

Snfx, — X) {% — j;) = Snjc,y, - Nxy, 

and so on; but particular care must be taken of signs, as co-variance 
may be negative. The correlation coefficients in Table 11.2 have 
been found from the sums of squares and products, and whereas 
the coefficient for block and residual variations is about + 0*6, for 
the treatment variations it is — 0-3; the difference is important. 
The residual deviations are independent of block and treatment 
differences, and their correlation coefficient is a partial coefficient 
with both block and treatment factors eliminated. The eight treat- 
ment totals are independent of the blocks and the block totals are 
independent of the treatments, but both are to some extent affected 
by the residual deviations, and although their correlation coefficients 
are in some degree partial, only one factor has been eliminated. 
However, as there are several plots per block and treatment, the 
effects of these two factors predominate over the residual in the 
correlation coefficients. 

The maximum and minimum temperatures in Table 74, p. 147, 
are daily readings taken in the montihi of August for 49 years, and 
we may now be led to inquire if the variations between and within 
years are of the same nature. Fisher and Hoblyn give the sums of 
squares and products, and we repeat them in Table 11.3, together 
with the correlation coefficients. f The variances show that the 
between-year variations are real, and the correlation coefficienis 
show that the association between maximum and minimum tem- 
peratures is stronger for these variations than for those within the 
year; that is to say, if we know the average maximum temperature 
for the month of August in any one year we can estimate the average 
minimum for that month with a little greater accuracy than we can 
the minimum for any one day, knowing the corresponding maximum. 

^ If and fit are constant, they may be placed outside the summation sign. 

t The data of Table 11.3 have been calculated from Fisher and Hoblyn’s 
full correlation table and not from the condensed Table 7 * 4 * 
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Other facts are also presented by Table 11.3; the maximum tem- 
perature is a little more variable from day to day during the month 
than the minimum (the variances are 26113 and 19445), while 
between the monthly means for different years, the maximum is 
very much more variable than the minimum temperature. 

If these data for the 49 years were separated, and from the 49 
correlation coefficients a mean were found, using Fisher’s trans- 
formation as shown in section 8.22, it would be something like the 
value 0-256 of Table 11.3. When the full data are available, however,' 
it is always better to add the 49 sums of products and squares 
separately, and to find the single correlation coefficient from, them 
in the way we have just shown. 

Correction of Treatment Effect on one Factor for Variations due to 
Correlated Factor 

11 . 11 . In an experiment like that meapring grain and straw yields 
in the previous section, it is sometimes desirable to know the effect 
of the treatments on one factor when the other is kept constant. 
Thus, the apparent effect of the treatments on grain yield measured 
by the variances in Table 1 1.2 is due not only to any possible direct 
effect of the treatments but also to the indirect effect of variations 
in the yield of straw, since straw and grain yields are positively 
correlated. It may be asked, “What is the effect of the treatments 
on grain yield when plots having the same straw yield are com- 
pared?” Such a question may be somewhat unreal in this instance, 
unless the straw yield may be taken as a measure of the relative 
fertility of the plots, but when the variable to be eliminated is the 
number of plants per plot, or the yield on the same plots resulting 
from a uniformity trial during a previous year, the practical relevance 
of such questions is more obvious. We shall show how the analysis 
of co-variance technique gives an answer to the question. 

The method is similar to those used in sections 8.33 to 8.35 for 
testing various hypotheses regarding regression constants. First, a 
single regression line is fitted to the data and the sum of squares of 
the residual deviations of grain yield is found. Then separate regres- 
sion lines are fitted,. one for each treatment, and the sum of squares 
of the residual deviations of grain yield from these lines is found. 
The degrees of freedom associated with the two sums of squares 
are also noted. Then, if the variance estimated from the reduction 
in residual sum of squares due to fitting the separate lines is signifi- 
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cantly greater than the final residual variance from the separate lines, 
the treatments have a significant effect on grain, when corrected for 
straw yield. 

Since the block variations have been eliminated from the treat- 
ment comparisons, we are only concerned with the treatment and 
residual variations, and may add the corresponding degrees of 
freedom and sums of squares and products in Table 11.2 to g’ive 
new totals, within blocks. Let y” be the grain and x" the straw 
yields measured as deviations from the block means, and let the 
above sums of squares and products be Sx”^, Sy"^ and Sx''y”. 
The sum of squares of deviations of y" from the one line is, according 
to equation (7.9Z), p. 171 : 




{Sxf'y'y 

Sx"^ 


This is entered in the first row of Table 11.21. Before fitting the 
regression line, there were 56 degrees of freedom for this total, and 
since the line absorbs one, 55 remain for the deviations. 

In fitting the separate regression lines for the treatments, we may 
assume the correlation between variations within the treatments in 
grain and straw yields to be the same for all treatments, and the 
lines will have the same slope, but will lie at different levels of y* 
Indeed, they will go through the separate treatment means, and the 
various sums of squares and products will be those given in the 
residual row of Table 11.2. The sum of squares of deviations from 
these lines is 


5 / 2 - 


{Sx'yJ 

Sx'^ 


and this is entered in the second row of Table 11.21. The degrees of 
freedom are one fewer than those for the residual in Table 11.2, 
due to the correction for the conunon slope of the regression lines. 
The effect of allowing for the treatments is measured by subtracting 
the residual sum of squares and degrees of freedom in Table 11.21 
from the “Total” and estimating the corresponding variance; this* 

* This has been mentioned in Section 8.35. If a "Common slope may not 
be assumed, the residuals from the following two systems of lines may be 
compared: (i) lines having different slopes but a common constant b 
[equation (7.7), p. 167] and (a) lines with separate slopes and constants b. 
The method of fitting the first system is not given. 
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is given in the last row of Table 11.21. The variance due to treat- 
ments is now significantly greater than the residual, so that after 
correction for the straw yield, the treatments have a significant effect 
on grain yield. This is partly because of the considerable reduction 
in the residual variance resulting from the important partial correla- 
tion between the two yields. We make no attempt to give a biological 
interpretation of this result. 

If a is the regression of grain on straw yield for the residual 


' TABLE 11,21 

Analysis of Variance of Grain Yield after Correction for Regression 

ON Straw Yield 


Source of Variations 

Sum of Squares’’ 

Degrees of 
Freedom 

Variance 

Total within blocks 

123 825*1 

55 

2 251*4 

Residual 

89 026 * I 

48 

I 854-7 

Treatments (difference) 

34 799-0 

7 

4 971-3 


deviations as estimated from the third row of Table 11.2, the treat- 
ment means corrected for variations in straw yield are 

(corrected) — cl{Xs “ 


Here, 


Sx^y^ 

Sx'^ 


== o-8i8 91. 


If a were the true population value, the standard error of the differ- 
ence between any two corrected means would be calculable from 
the corrected residual variance in Table 11.21. As it is, the error 
in a also contributes somewhat to the standard errors of the corrected 
treatment means. It is for this reason that the variance due to treat- 
ments in Table 11.21 cannot be obtained directly from the corrected 
means. 

A discussion of the practical aspects of this application of the 
analysis of co-variance technique to agricultural experiments is given 
by Fisher (1936), Sanders and Wishart (1935). 
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MULTIPLE AND PARTIAL CORRELATION 
Elimination of a Linear Factor 

11 .2. In section ii.i, when finding the partial correlation between 
grain and straw yields with block and treatment effects eliminated, 
we measured the yields as deviations from discrete block and treat- 
ment means and correlated the deviations. Sometimes, however, the 
factor to be eliminated can be described quantitatively and its 
relation to the variates to be correlated can be expressed by regres- 
sion lines instead of by discrete means. Then the deviations from 
these lines may be correlated. We shall deal with partial correlation 
when the regression lines are straight. 

If the quantities are x, y and z and z is to be eliminated, let the 
values of x and y lying on the regression lines relating them to z 
be and Y^, Then equation (ii.ii) becomes 

S(x x)(y y) = Six - X^y - F.) + -- i)(F, ^ y). 

The deviations from the regression lines may be found explicitly, 
and from them the partial correlation coefficient, or if the regressions 
are linear, the following formula may be used, 


' xy 


V(l - 


' (II -3) 


where ^ ^ is partial correlation coefficient between x and y, with 
z eliminated, and and Yy^ are the three total correlation 

coefficients between y and This equation is proved in the 
appendix to this chapter. By interchanging the suffixes, the partials 
between x and z and between y and z can be found from the 
same formula. In using equation (11.3), tables by Miner (19^2) of 
V I — for different values of r will be found very useful. 

Mumford and Young (1923) give the correlation coefficients in 
Table 11.4, based on measurements taken on i no boys; we will 
assume the regressions to be linear. It would appear at first sight that 
there is a strong tendency for tall boys to have a large vital capacity (the 
correlation coefficient is 0*835), but the older boys tend to be taller 
and (as might be expected) to have a larger vital capacity. It is reason- 
able, therefore, to suppose that a good deal of the apparent association 
between vital capacity and height may be due to the fact that the 
taller boys are also older, and before the true connection can be found, 
correction must be made for age. Similar arguments apply to the 
correlations between vital capacity and weight and between height 
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and weight, and, using equation (11.3) to find the three partial 
correlations with age eliminated, we obtain the results in the upper 
part of Table 11.41. The first three coefficients show that the rela- 
tionships between vital capacity, height and weight continue to exist, 
even though the age is kept constant, although they are a little less 


TABLE 1 1.4* 
Correlation Coefficients 


Vital capacity and height 

+0-835 

Vital capacity and weight 

+0-851 

Vital capacity and age . . 

+0-662 

Weight and age 

+0*701 

Height and age 

+0-714 

Height and weight 

+0*897 


We only include a portion of the data in the paper. 


strong. We are still, however, far from a complete knowledge; part 
of the association between vital capacity and height may really be 
an indirect consequence of the fact that tall boys are also heavy, and 
weight may be the determining factor, or, conversely, height may 
have the direct and weight the indirect connection with vital capacity. 
Again we may use the formula of equation (i i .3) to eliminate further, 
and in turn, weight and height, and the results are in the lower part 
of Table 11.41. The correlations are now much reduced, but are 


TABLE 1 1. 41 

Partial Correlation Coefficients 

Vital capacity and height (age eliminated) . . 

Vital capacity and weight (age eliminated) . . 

Height and weight (age eliminated) . . 

+0-690 

+0-724 

+ 0-794 

Vital capacity and height (weight and age eliminated) 
Vital capacity and weight (height and age eliminated) 

.. +0*271 

.. +0-399 


still real, and it appears that the association of vital capacity with 
height is a little less important than that with weight. Thus, by 
means of equation (11.3), any number of factors can be eliminated 
one at a time, and ultimate partial correlation coefficients can be 
found. In using the formula, it is advisable to retain more significant 
figures than will be required in the final answer, as errors due to 
too drastic approximation tend to multiply themselves. Mumford 
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and Young in their paper consider other factors (stem length, chest 
girth), and the conclusions even of the second part of Table 11.41 
cannot be regarded as final. 


Elimination of a Non-linear Factor 

11.3. If the regression between any two factors differs much from 
linearity, equation (i i .3) is not valid, although it may often be us*ed 
as a fair approximation when the departure from linearity, although 
statistically significant, is not very great. It is always possible, of 
course, to fit curved regression lines of x and 3; on z, to find the actual 
deviations of x and y from those lines and to correlate the deviations, 
but in such circumstances it is obviously desirable (and necessary^ 
if tests of significance are to be applied) that both x and y should 
be fitted in the same way and to the same degree. Any method of 
smoothing may be used to determine the regression curve, but some 
such form as the polynomial, which can be found by the method of 
least squares, is preferable so that the number of degrees of freedom 
absorbed in fitting may be known. If the polynomial is used (say) to 
the third degree in z, and x and y are the variables which are to be 
partially correlated, it is permissible to regard *, y, z, z^ and s® as 
five variables related linearly (finding the individual values of s® 
and from those of a:), to correlate them in every possible way, 
and then to eliminate ;s®, z^ and z in turn by repeated applications 
of equation (11.3) as shown in the last section. If there is only one 
value of the dependent variable to every one of that which is to 
be eliminated, and a polynomial can be fitted according to Fisher’s 
system as outlined in section 9.31, the operation of correlating 
residuals is very easy; we deal with it in the next section. 


11 . 31 . We found constants A, B, C, etc., of the polynomial [equation 
(9.6), p. 197] and said that the sums of squares of the residuals were 
found by subtracting in turn from the crude sum of squares of the 
observations, quantities NA^, etc. (section 9.31); similarly the sums 
of products of the residuals after fitting any number of terms of the 
polynomial are obtained by subtracting successively from the crude 
sum of products the quantities, 




NiN^ 


-iV.. 


■)(»>- 4 ) 


CiC. 


2J 


12 ■ ' 180 

z 800 


(II. 4) 
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(paying due regard to sign) where etc., are the constants of 

the polynomial of the first factor on the eliminated one and B^, 
etc., are the constants of the pol3momial relating the second factor 
to the eliminated one. If a curve of zero order is fitted, i.e. if the 
deviations are measured from the mean, the sum of products of 
r^iduals is S(xy) — NA-^A^, and this is the ordinary method of 
correcting to the mean by subtracting Nxy\ the subtraction of the 
other terms for curves of higher order may be regarded as equivalent 
to correcting to a moving mean. These product terms are just as 
much part of the analysis of co-variance as the square terms are of 
the analysis of variance, and we have inserted them in Table 11.5 
for the two series of protein percentages, the original data being 
in Table 8.1, p. 174, and the constants being given on p. 199; 
Table 11.5 corresponds exactly to Table 9.4 (p. 200), and the two 
should be read in conjunction. It would be a good and convincing 
exercise if the reader were to determine the 58 protein contents 
given by the two cubic regressions, find the actual residuals, and 
see if their sums of squares and products agree with those in Tables 
9.4 and 1 1. 5. As before, the sum of products at any row may be 
used with the corresponding sums of squares to determine a partial 
correlation coefficient. We will now use these data to illustrate another 
type of statistical problem. 


Spurious Correlation of Time Series 

11 . 4 . An investigator may ask if there is any common factor (perhaps 
weather) which affects the protein contents of both series of corn in 
the same way, and it is natural to use the correlation coefficient to 
examine this question. The crude correlation coefficient as found 
from the ‘‘totals” rows of Tables 9.4 and 11.5 is — 0*638, and 
would at first sight appear to indicate that if Ae common factor 
affects protein content, it acts on the two series in opposite ways. 
We have to remember, however, that because of the way in which 
seed was selected for the two series, one increased in protein content 
while the other decreased, and this predominating effect gives a 
negative correlation coefficient. In order to investigate the effect of 
the possible common factor, we must first eliminate the variation in 
protein with time (due to seed selection), and then correlate the 
residuals. First, we will assume as an approximation that the regres- 
sions on time are linear, with correlation coefficients of + 0*862 
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and — 0*708, and using equation (11.3) (time is ^ and the protein 
contents are x and 3;), find the partial correlation to be 

— 0*638 + 0*862 X 0*708 o 

-- 7 =.^.= .■ ; - : L== = — 0*08, 

V (i — 0*862^) (i — 0*708^) 

which, on such a small sample, is not significant. We have seen, 
however, that the regression of the two protein contents on time 
departs considerably from linearity, and equation (11.3) is useful only 
as an approximation; but we may use the analyses of Tables 94 


TABLE 11.5 

Analysis of Co-variance of Protein Content 


Source of Vaxiatiou 

Degrees of Freedom 

Sums of Products 

First order regression 

I 

- 33*01 

Second order regression 

I 

— 2*86 

Third order regression 

I 

— 2-i8 

Residual 

25 

+ 3-52 

Total 

28 

- 34*53 


and 1 1. 5 to correlate the residuals after fitting cubic equations. This 
correlation coefficient is 


+ 3 - 5 ^ 

V 13*34 X 12*06 


= + 0*29, 


and although still too small to be significant on such a small sample, 
does suggest that some common factor varying from year to year, 
may have been affecting the aimual residual fiuctuatiQns of protein 
yield in the two series in the same way.*^ 

Such problems, in which it is desired to correlate two series of 
observations extended in time or space, have received a good deal of 
attention from statisticians ; the example in section 7.3 of the correla- 

* We have treated the first observation in each of the two series as inde- 
pendent, but they are actually the same protein content of a common crop. 
This does not affect the constants much, nor, in this instance, our con- 
clusions. 
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tion between age at death and the proportion of marriages contracted 
in the Church of England comes into this category. Essentially the 
problem is a special case of partial correlation, with time (or space) 
as the variable to be eliminated, and the discussion has centred 
round the development of suitable means of expressing the regres- 
sion and obtaining residuals; there is most difficulty in economic 
statistics where factors are apt to vary periodically (or in waves), 
but for most biological data extending over only a few years, the 
method we have outlined is adequate. Even if no such obvious effect 
as that in the example just worked is present, the possibility of some 
unsuspected time-effect in data collected over some period of time 
should always be considered. 


Partial and Multiple Regression 

11 , 5 . With every partial correlation coefficient there goes a partial 
regression coefficient, which is obtained by dividing the sum of 
products by the appropriate sum of squares of the independent 
variable and which expresses the amount of change in one factor 
associated with a second when the other factors are constant. The 
regression of grain on straw yield (Table 11.2) for the residual 
variations is the value of a on p. 25c, viz. 

71496-1 

unit of grain associated with a change of i unit of straw per plot. 

When there are several quantities all linearly related, partial 
regressions are best found by fitting a multiple regression equation. 
If y is the dependent variable and . . . are the independent 

variables, the linear regression of y on the x's is expressed by an 
equation of the form, 

Y' = ax[ + Jxg + , . . (ii-S) 

Equation (11.5) is called the multiple regression formula of j; on x-^, x^^, 
x^, etc., and the constants a, b, c, etc., are the partial regression 
coefficients, showing how much unit changes in the individual x 
variables affect independently and directly. The constants, a, b, c, 

* x' and y' are deviations from the grand means. 
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etc., may be found by the method of least squares, and this leads to 
a "series of linear equations, 

Sy'x[ = aSx[^ + bSx'^x'^, + cSx[x^ + . . . 

Sy'xL == aSx.xL + bSx'^ + cSxLx' + . . • r 


The summations have to be found, of course, from the data, and the 
equations have to be solved for <2, 5 , Cy etc. These are not unlike 
equations (9.4) given on p. 195 for determining the constants of a 
curved regression line ; and indeed the two processes are somewhat 
the same, for in the earlier one we are finding the multiple regression 
of y on X, x^y x^y etc. 

We shall illustrate this by finding from the data of Table 11.6 
(Harris, 1931) a formula for predicting the loaf volume from the 
protein content of the flour and the percentage of it extracted by 
potassium bromide solution. The forty-four wheats were analysed 
and subjected to the baking test to give the data of the table. If we 
call the loaf volume j;, and the protein percentages % and x^^ the 
sums of squares and products are as given in the following equations, 

1463-33 = 107-55 161-646 

— 3 033 - 85 = — 161 - 64 + 473 • 19 6. 

These, solved for a and 6, lead to the regression equation, 

loaf volume in cx, == 14-738 5 X crude protein per cent, 

+ 0*759 7 protein extracted by KBr per cent, 

all quantities being measured as deviations from their means. 
If we were to use the crude protein percentage only, a would then be 
the ordinary linear regression, and equal to 


I 462-33 
107-55 


==13*6 c.c., 


for a change of i per cent, in protein. 


The Multiple Correlation Coefficient 

11 . 6 . Just as we need the correlation coefficient to give an idea of 
the accuracy of a linear regression formula with a single term, so we 
need a constant to specify the accuracy of a multiple regression 
formula, and this is provided by the multiple correlation coefficient, 

R 
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This may be found by correlating the predicted values (F) given by 
the formula with the actual ones (y); or alternatively, the sum of 
squares of the deviations of 3^ from the regression is (i R^) multi- 
plied by the sum of squares of the deviations of y from the grand 
mean, where R is the multiple correlation coefficient. Thus, multiple 
coigrelation is another case of the analysis of variance, analysing 


TABLE 1 1.6 


Loaf Volume 
(c,c.) 

Crude 

Protein, 

Per cent. 

Protein 
Extracted by 
KBr, per cent.* 

Loaf Volume 
(c.c.) 

Crude 

Protein, 

Per cent 

Protein 
Extracted by 
KBr, per cent 

540 

IZ'O 

31*2 

472 

12*6 

21 * 8 

475 

II -5 

31-4 

500 

10*4 

31*6 

450 

II -o 

31*3 

520 

13*8 

28*8 

455 

12-6 

27*9 

525 

I 3'6 

26*7 

465 

II -9 

314 

490 

xi -5 

30*8 

470 

120 

30*2 

505 

10 6 

3 i ’5 

440 

11*7 

3 f 9 

475 

10*7 

3 i '5 

490 

12 * I 

31*8 

545 

14*2 

28-8 

475 

12*9 

31*0 

534 

13-4 

27-4 

465 

11*3 

31*7 

490 

11 -o 

293 

520 

14*7 

28*3 

507 

10*8 

29*4 

525 

15*2 

20*5 

530 

12*8 

28*2 

450 

io*s 

28*7 

505 

12*2 

29*7 

500 

131 

27*7 

495 

12*5 

28*9 

445 

10 - I 

31*8 

515 

12*4 

30*2 

443 

11*2 

28*9 

505 

14*7 

21 *4 

463 

II -3 

25*5 

S 35 

14*5 

22*7 

446 

II -5 

30-0 

445 

9*0 

34*9 

500 

12*9 

28*5 

470 

11*2 

29*7 

487 

10*4 

25-5 

435 

9-1 

34*0 

502 

'l2*7 

25*5 

475 

10*8 

29*2 

529 

16 -o 

21-8 

505 

13*9 

26*0 


* This is the percentage of the crude protein extracted by KBr. 


that of y into two parts, one associated with the regression and the 
other a residual, and the degrees of freedom of the former are equal 
to the number of terms in the regression equation. We have entered 
these terms in Table 11.7, assuming the size of sample to be N and 
the number of terms [x^s in equation (11.5)] to be m'. If there is 
only one term on the right-hand side of the regression equation, 
= I, and Table 11,7 reduces to the form for the correlation of 
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two variables (Table 7.6, p. 158), where i? = r. On the other hand, 
if we compare Tables 11.7 and 9.3 (p. 193) we see that they are of 
essentially the same form, JR^ corresponding to and m' to {m — i), 
N being the same ; thus, the multiple correlation coefficient and the 
correlation ratio are essentially analogous constants, both measuring 
asseciation, the former between one variate and a number of others 
as expressed by a multiple linear regression formula, and the latter 
between one variate and another expressed as a series of array means ^ 
or as a curved regression line. We have already pointed out the 
analogy between the multiple regression formula and the curved 
regression line, and now see that it is complete. Like the correlation 
ratio, the multiple correlation coefficient is essentially positive, and 
becomes larger as the number of degrees of freedom associated with 
it (m') increases; indeed, if there were as many constants as there 
are independent deviations of y in the sample {m! = N — i)^ R 
would equal unity. Hence, all that we have written about the corre- 
lation ratio is true of the multiple correlation coefficient. 

In order to find jR, it is not necessary to find the individual regres- 
sion values of j, but the following relation may be used 

SY'^ = aSyx[ + bSyx^ + cSy'xl + . . . (ii*7) 

For our wheat quality and loaf volume data, 

5F2 = ij 2 ^y 2 = 14*738 5 X I 462*33 

— 0*759 7 X 2033*85 == 30015*8, 

* Squaring equation (11.5), 

y '2 ^ ax'-^ax'-^ -h hx^ + + * . •] 

+ hx'^ax-y + hx2 + cx^ + . * •] 

+ cx^[axi + bx2 + ^^^3 + . • J 

■+ 

= a[axi^ + bxiX2 + cx^x^ + • * •] 

+ b[axiX2 4* bx2^ + ^^ 2 ^ 3 ^+ * • •] 

+ c[ax^XQ + bx^XQ + cx^ -f* . * •] 

+ • * 

and summing, 

SY'^ = a[aSx[^ + bSx[x2 + cSx{x^ + . . .] 

-f b[aSx[x2 + bSx^ + cSx^x^ + . . .] 

4* c[aSx[x^ 4“ bSx2X^ 4- 4- • • •] 

4 - « 

Equation (i 1.7) follows from this when the values of equation (11.6) are 
substituted for the terms in the brackets. 
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and since 

= 41 346-8, = i? = 0-697. 

Correlating the loaf volume with the crude protein content only, 
we find r == + 0 • 694, and thus see that although it absorbs one 
more degree of freedom, the multiple regression equation is scarcely 
better than that using protein content as a single factor. We Seal 
with tests for the significance of such a difference in section 11.73. 

Errors of Sampling in Multiple and Partial Correlation 
Partial Correlation Coefficient 

11 . 71 . Fisher (1934^) has shown that the distributions of the partial 
and the total correlation coefficients are the same, when they are 
estimated from deviations having the same number of degrees of 
freedom. Thus, if there are N pairs of observations and m factors 
have been eliminated (leaving N—m—i degrees of freedom), 
the partial correlation coefficient is distributed like that of the total 
coefficient based on {N — m) pairs of observations. 

The residual correlation coefficient between grain and straw yield 
in Table 1 1.3 is based on 49 degrees of freedom and may be treated 
like an ordinary correlation based on 50 pairs of observations, while 
the correlation for treatment variations is based on 7 degrees of 
freedom, and may be regarded as arising from a sample of 8; we 
will use Fisher’s z' transformation (section 8.3) to test the signifi- 
cance of the difference between them. The values of z' are o-68i 
and — 0*347 with a difference of i *038; the standard error of this is^ 

zt Vs + ± 0*47, 

and the difference is significant. 

We found in section 11.4 that the partial correlation coefficient 
between the residuals of protein contents of two series of corn after 
the time trend had been eliminated by a cubic was +0-39, and 
since there were 39 observations with four terms eliminated (in- 
cluding the mean) we must test this as though it were based on 
36 pairs of observations. Using Fisher’s table of values of the corre- 
lation coefficient at different levels, we should enter it at n = 24; 
there is no value at this point, but we see that for n = 35, r = 0-33 
lies on the o*i level, and our observed value, being less even than 
this, is not significant. 
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For the same reasons as advanced in section lo.i in connection 
with the mean, the sampling error of the correlation coefficient is 
only given by the usual formulse when the correlation is between 
homogeneous and independent deviations arrived at after a complete 
analysis of co-variance. 


Multiple Correlation Coefficient 

11 . 73 . In order to test the significance of a multiple correlation, 
coefficient we may use the analogy with non-linear regression and 
employ the same test. From the analysis of variance of Table 11.7 
we find 

^ (I _ Ri)m' 

and enter Fisher’s tables at — m' and n^ = {N — m' — 1). 
Wishart (1928) gives a table of values of the multiple correlation 
coefficient lying on the 0-05 and o-oi levels of significance. 

For the data of Table 11.6 (loaf volumes and wheat qualities), 
R? = 0'485, m' = 2, iV = 44, 


« = ilog. 


0-485 

0-515 



1-48, 


and this is even greater than the value of % lying on the i per cent, 
level when % = 2 and = 30 (i.e. greater than 0*84). Hence this 
multiple correlation coefficient is significantly greater than zero. 

V We may also wish to test if a' regression formula with a large 
number of constants gives a significantly closer prediction than one 
using fewer, and this may be done in exactly ffie same way as in 
section 9.2, where we tested for non-linearity of regression. If there 
are two regression formulae involving and m'^ constants, mly being 
greater than m'^, and the multiple correlation coefficients are i?i and 
^2, the complete analysis of variance is as shown in Table 11.71, 
and since all the sums of squares are positive, must always be 
greater than R^. If, however, the difference between and i?2 is 
the result of a real effect, the variance for the difference between 
regressions will be significantly greater than the residual, and using 
Fisher’s test, we see if 




i?|) 


i?f) 
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is significant for degrees of freedom, 

= and N — — i. 

Applying this to the data of Table 11.6, for protein content 
alone ==0-482 = while for the multiple regression, 

i?2 = 0-485 (m'^ = 2); hence 


' = ‘“V 


0-003 X 41 

0-515 


0-716, 


and certainly is not significant. Thus, the extraction of some of the 
protein by potassium bromide has added nothing to the information 
provided by the crude protein content alone, as far as the prediction 
of loaf volume for this series of flours is concerned. 


11 , 73 . We will illustrate the further application of these tests by 
another example, which presents several features; the data are in 
Table 11.8. Fifty groups of varying numbers of cotton hairs were 
weighed and measured, and the weights of the groups were expressed 
as multiples of io“"®grammes per centimetre. These were then swollen 
in caustic soda and placed in three classes, A, B and C, according to 
their appearance under the microscope. The problem is to determine 
if the three classes show real differences in average hair-weight per 
centimetre. A regression formula of the form of equation (11.5) 
may be obtained to express the weight of a group of hairs in terms 
of the number in each of the three classes ; if we put y equal to the^ 
group weight, and x^, and x^ the numbers in the three classes, 
a, h and c are the corresponding mean hair-weights per centimetre. 
Similarly a second regression may be obtained in which only the 
average hair-weight for all classes, and the total number of hairs 
in the group are used; this is the ordinary case with one constaiff. 
Now the former regression, absorbing three degrees of freedom, 
will necessarily have associated with it a little more of the variance 
in Weights of the groups than the second, but if the three groups 
really have different-hair weights, the equation with three constants 
will fit the data very much better than that with only one, and the 
difference in the associated variances will be significantly greater 
that! zero. We may therefore investigate the problem by finding the 
regressions and variances, and testing the significance of the differ- 
ence in the latter, 



TABLE, 1 1. 8.*^ — ^Numbers of Cotton Hairs and Weights (in grammes) 
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• Some of the groups h^ve fractional numbers because one or two hairs were lost between weighing and classifying, and 
these were assumed to be distributed proportionately in all classes. The weights were supplied as mean weights per centi- 
metre of hair to the nearest unit, and those in this table are the means multiplied by the number of hairs m the group. 
The effect of these approximations in the data is exceedingly small. 
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To find a, b and c we solve equations (11.6), and for these we 
need to determine the sums of squares and products from the data. 
It is not fair, however, to give all groups, whether large or small, 
the same weight in finding these sums. If the number of hairs 
in a ^roup is n, and the variability of weights of single hairs is 
approximately the same in all classes, with a standard deviation of 
O', the standard error of the mean weight per hair for the group Is 
aj's/ fly and that of the total weight is n times this, and so is propor- 
tional to ^/ n; hence the variance of the total we^ht of a group due 
to variations between hairs within the same class is proportional to 
fly the number in the group. We may use the results of section 3.7 
and regard ifn as the quantity of information; then in making 
summations, each term may be weighted with this quantity, multi- 
plying each product and square by ifn before summing. For example, 
the weighted sum, 

gv». = ('3"43Xis-4^ 558oxa5:0^^^^ 

\ 23 30 

« 

A further modification is necessary ; a group with no hairs naturally 
has zero weight, so we shall make our regression line pass through 
the origin and not through the mean, and shall measure all deviations 
from zero; that is, in equations (11.5) and (ir.6) we write x and y 
instead of x' and y' . It may assist some to think of the data of 
Table 11,8 as a half of the complete data, the other half having 
equal negative values, so that the ,mean is zero, and the sums of 
squares and products we obtain are half the total for the imaginary 
complete results. 

The appropriate sums of squares and products are set out in equa- 
tions below, and their solution yields the regression equation following 
them. 

363 785*7 = I 468*85^ + 288*836 + i8*42c 
73472-3= 288-83^+ 74 - 73 ^+ 4 - 94 C 
4646-0= 18-4241+ 4-946+ I-04C 

, Y = 226-255 7*1 + 114-127*2 — 82-16*3 

Although the data are to some extent approximate, the arithmetic 
must be performed with considerable precision if the final results 
are to have any accuracy at all. In finding Sx^fn we calculated 
x-Jn to five decimal places, and then summed jX * i/m on a machine; 
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as a check, we then calculated and summed % Xyln, In performiDg 
the divisions for the solution of the simultaneous equations we bad 
to work to nine or ten significant figures to obtain the accurac/ of 
the constants shown in the regression. It is not claimed that the four 
decimal places for a have any physical meaning, but as a constant 
of the sample which will be used to find the sum of squares of group 
weights, that accuracy is necessary. Classes A and B have differing 
mean hair-weights per centimetre, while the hairs of C appear to 
have a negative one ; however, there were very few of class C, and 
this irrational result is due to sampling variations, and signifies 
nothing. To find the sum of squares associated with the regression 
we use equation (11.7), and find 

226-255 7 X 363 785-7 + 114-127 X 73 472-3 — 82-16 

X 4 646-0 == 90 312 000. 

This could not have been given correct even to five figures if we 
had determined the regression coefficients correct only (say) to 
three. The weighted sum of squares of the hair- weights is 


3 ^ 3 ^ 

^3 


+ 


5580^ 

30 


-f . . . = 91 220 148, 


and we will use this to five significant figures only. 

We now have to find a regression equation using one constant, 
the mean hair-weight per centimetre for all classes. Using only the 
first term and equation of (11.6), and finding the weighted sums of 
products and squares, we have 



^ since there is only one class, Xi == n, and we obtain Sy = a' Sn, 
which is the straightforward relationship, mean hair '-weight == total 
weight of all groups divided by the total number of hairs. This gives 
a' = 203-736, and the sum of squares of weights associated with 
it is 203-736 X 441904 = 90032000 (correct to five significant 
figures). 

The analysis of variance is in Table 11.9, and in reckoning the 
degrees of freedom it must be remembered that deviations are not 
measured from a mean determined from the data, but from the 
origin, so that the fifty groups contribute fifty degrees to the sum of 
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squares. In order to test the significance of the difference in regres- 
sions, we find 

1 1 ’' 4 ° 

« = — = 0 - 99 , 

and this is significant, for when % = 2 and :3r = o*84 

lies on the i per cent, level. Thus we conclude that the two classes^ of 
hairs A and jB, as determined by swelling in caustic soda, have sig- 
nificantly different mean hair-weights per centimetre. This result 
could not have been obtained directly, because tfiere were no means 
of weighing the hairs individually, and it was not practicable to 
obtain the individual classes for separate weighing. 


TABLE 1 1.9 

Analysis of Vabiance of Weights of Groups of Cotton Hairs 


Source of Variation 

Sums of Squares 

Degrees of 
Freedom 

Variance 

Multiple regression — 
Simple regression 
Difference in re- 
gression 

Residual 

90 032 000 1 

>90 312 000 
280 000 j 

908 000 

■|3 

47 

90 032 000 

140 000 
19 300 

Total 

91 220 000 

50 

— 

. . 

_ _ 

- 



11 . 8 * We must emphasise the underlying assumption of the methods 
of this chapter that the residuals are homogeneous in the sense that 
their variance and co-variance are constant. In the example of 
section ii.i, for instance, we assume the relation between residual 
straw and grain yields is sensibly the same for all treatments and^ 
blocks, and in the example of section 11,5 illustrating multiple 
regression, we are assuming that the relation between loaf volume 
and protein extracted is the same for all values of crude protein. The 
tests of significance also assume the residual deviations to be normally 
distributed. 
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APPENDIX 


Proof of the Formula for the Partial Correlation 
Coefficient 


"Let X and y be two variables, and let z be the third which is to be 
eliminated. TTien the regressions of x and ^ on « (measuring all as 
deviations from the mean) are 


I Sx'^ 

I ' 


and 


41 

T-r 


. (II.8) 


and the partial correlation coefficient between x and y is 

S(x' - X')(y' - F) 


* xy.z 


VS{x' - xys(y' - Y'f 


(II.9) 


Expanding the numerator of (i 1. 9) 

S{x' - X')(y' - F) = Sx'y' + SX'Y' - Sx'Y' - SX'y', 
and substituting from (11.8) for 'AT' and Y', 

Six’ ~ X')iy’ - F) 

= Sx'y’ + Sx'^Sy’^ - 

= Sx'y’ - Vy,-^Sx'^Sy’\ 

= V Sx’^Sy’^r^y — (i i .91) 

Expanding the first term in the denominator of (11.9), 

5 (*' - XJ = Sx'^ + SX’^ - zStYX’, 
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and substituting for from (11.8), 


= - 2r,,Sx'z'^^^ 

= (11-92) 

and similarly, 

S(y'~Yr = (i~ry,)Sy'^ . . . (ir.^3) 

Substituting (11.91), (ii-92) and {11.93) in (11.9), we obtain. 


^xy ^xz^yz 


Vii — ~ Ty,^) 
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TABLE B 


Values of on the 0*05 and o’oi Levels of Significance for 
g Degrees of Freedom 


g 

^0-05 

^0-01 

g 

^0*06 

y2 

'^0*01 

g 

y 2 

^0*05 

y 2 

Ao-oi 

I 

3-841 

6-635 

II 

19-675 

24*725 

21 

32*671 

1 

38-932 

2 

5*991 

9*210 

12 

21*026 

26*217 

22 

33*924 

40*289 

3 

7-815 

ii* 34 X 

13 

22*362 

27*688 

23 

35-173 

41-638 

4 

9*488 

13 -277 

14 

23-685 

29*141 

24 

36-415 

42*980 

5 

11-070 

15*086 

15 

24*996 

30-578 

25 

37-652 

44-314 

6 

13-592 

i^6;..8^I2 

16 

26*296 


26 

38-885 

45-642 

7 

14-067 

18-475 

17 

27-587 

33*409 

27 

40-113 

46-963 

8 

15-507 

20*090 

18 

28 * 869 

34-805 

28 

41-337 

48*278 

9 

16-919 

21*666 

19 

30*144 

36*191 

29 

43-557 

49-588 

10 

18-307 

23 * 209 

20 

31*410 

37-566 

30 

43-773 

50*892 


TABLE C 

Values of t on the 0-05 and o-oi Levels of Significance for 
n Degrees of Freedom 


n 

^ 0*05 

^0*01 

n 

? 0-05 

^0-01 

I 

12*706 

63-657 

II 

2*201 

3 *106 

2 

4*303 

9-925 

12 

2*179 

3-055 

3 

3-183 

5-841 

. 13 

2* 160 

3*012 

4 

2*776 

4*604 

14 

2*145 

3-977 

5 

2-571 

4*032 

15 

2*131 

3-947 

6 

2*447 

3*707 

16 

2*120 

2-9Z1 

7 

2*365 

3*499 

17 

2*110 

2-898 

8 

2*306 

3*355 

18 

2*101 

2-878 

9 

2*262 

3*250 

19 

2*093 

2-861 

10 

2*228 

3*169 

20 

2 *086 

2-845 




25 

2*060 

2-787 




30 

2*042 

2-750 




00 

I * 960 

2-576 


These tables are taken by consent from Statistical Methods for Research 
Workers, by Professor R. A. Fisher, published at i6s. by Oliver & Boyd, 
Edinburgh, and attention is drawn to the larger collection in Statistical 
Tables, by Professor R. A. Fisher and F, Yates, published at 13s. 6d. by 
Oliver & Boyd, Edinburgh. 












Tabl^ for Testing the Significance of the Difference between Estimates of Variance and s| (sf > 4) 

Based on and Degrees of Freedom 
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This table, which is given by consent, is calculated from the tables of z in Statistical Methods for Research Workers 
by Professor R. A. Fisher, published at i6s. by Oliver & Boyd, Edinburgh, and attention is drawn to the larger collection 
in Statistical Tables^ by Professor R. A. Fisher and F. Yates, published at 13s. 6d. by Oliver & Boyd, Edinburgh, 
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